Count mis-match - orginals vs. files imported vs. in search #747

Closed
opened 2026-02-19 23:15:36 -05:00 by deekerman · 3 comments
Owner

Originally created by @andrewlow on GitHub (Jan 25, 2021).

I just fed a new instance of photoprism (but not my 1st instance, I'm re-importing my library) - so I'm actually paying attention to things this time.

I fed the import 371 files. I checked the number of files by doing

$ find . -type f | wc -l
371

After the import finished - the WebUI shows me

Search: 336
Review: 1

Library->Originals: 425

Looking at the filesystem..

photoprism/originals$ find . -type f | wc -l
340

I get that there may be duplicates etc. However.. these numbers make me scratch my head

  1. Why is the Originals count on the web UI higher than the number of files I imported? (425 vs. 371)
  2. Why is the Original count on the web UI higher than the number of files in the /originals directory? (425 vs. 340)
  3. How come I can only search fewer photos than Originals? (336 vs. 425) OR (336 vs 340)

Hmm.. now I think I did review 3 photos and gave them a check -- so maybe on the next index - I'll arrive at a place where I can search all of the photos that appear in the /originals directory.. => 336 + 1 + 3 == 340

Something strange is happening here. I'll maybe kick a reindex and see if that helps? I could live with the fact that of the 371 photos, there were some duplicates.

Originally created by @andrewlow on GitHub (Jan 25, 2021). I just fed a new instance of photoprism (but not my 1st instance, I'm re-importing my library) - so I'm actually paying attention to things this time. I fed the import 371 files. I checked the number of files by doing ``` $ find . -type f | wc -l 371 ``` After the import finished - the WebUI shows me Search: 336 Review: 1 Library->Originals: 425 Looking at the filesystem.. ``` photoprism/originals$ find . -type f | wc -l 340 ``` I get that there may be duplicates etc. However.. these numbers make me scratch my head 1. Why is the Originals count on the web UI higher than the number of files I imported? (425 vs. 371) 2. Why is the Original count on the web UI higher than the number of files in the `/originals` directory? (425 vs. 340) 3. How come I can only search fewer photos than Originals? (336 vs. 425) OR (336 vs 340) Hmm.. now I think I did review 3 photos and gave them a check -- so maybe on the next index - I'll arrive at a place where I can search all of the photos that appear in the /originals directory.. => 336 + 1 + 3 == 340 Something strange is happening here. I'll maybe kick a reindex and see if that helps? I could live with the fact that of the 371 photos, there were some duplicates.
Author
Owner

@andrewlow commented on GitHub (Jan 25, 2021):

Nope - a reindex does not seem to have helped.

Search: 336
Review: 1
Originals: 425

Actual files on disk in originals: 340

Thus - 3 files in originals.. aren't photos that I can search/review?
And - I have 85 originals which have no files?

@andrewlow commented on GitHub (Jan 25, 2021): Nope - a reindex does not seem to have helped. Search: 336 Review: 1 Originals: 425 Actual files on disk in originals: 340 Thus - 3 files in originals.. aren't photos that I can search/review? And - I have 85 originals which have no files?
Author
Owner

@graciousgrey commented on GitHub (Jan 26, 2021):

I guess we need to add this to our FAQs :D

  • You imported 371 files.

  • During the import duplicates are skipped, so you may end up with fewer files physically existing in the originals directory than you fed the importer with. This could explain why you end up with 340 instead of 371.

  • During indexing the files in originals , we create a .jpeg Version for all other file types than .jpg (e.g. RAWS, Videos, PNGs etc). These jpegs are stored (with default settings) in /storage/sidecar. But in the UI they are shown in the originals section and added to its count. This explains why you have a count of 425 with 340 files in originals.

  • The number in search is smaller than the one in originals, because some files are stacked.

    • E.g. A raw + related jpg + related xmp file = 3 originals but 1 photo
    • E.g. A mp4+ related jpg = 2 originals but 1 photo
    • It is also possible that multiple .jpg files are stacked because they are related
@graciousgrey commented on GitHub (Jan 26, 2021): I guess we need to add this to our FAQs :D - You imported 371 files. - During the import duplicates are skipped, so you may end up with fewer files physically existing in the originals directory than you fed the importer with. This could explain why you end up with 340 instead of 371. - During indexing the files in originals , we create a .jpeg Version for all other file types than .jpg (e.g. RAWS, Videos, PNGs etc). These jpegs are stored (with default settings) in /storage/sidecar. But in the UI they are shown in the originals section and added to its count. This explains why you have a count of 425 with 340 files in originals. - The number in search is smaller than the one in originals, because some files are stacked. - E.g. A raw + related jpg + related xmp file = 3 originals but 1 photo - E.g. A mp4+ related jpg = 2 originals but 1 photo - It is also possible that multiple .jpg files are stacked because they are related
Author
Owner

@andrewlow commented on GitHub (Jan 26, 2021):

Recap: total source files:

$ find . -type f | wc -l
371

BTW - I do not have any stacked photos, but see how this may shift counts.

Ahh.. Ok - so I have a number of .NEF files - 89 to be exact in this tree

$ ls -lR | grep --count \.NEF$
89

So I should expect that I will get some duplicates.

Looking at other file types in that tree
12 .jpg
269 .JPG
1 .jpeg

89 + 12 + 269 + 1 = 371.

Examining duplicates by using sha256 on the files - I can see a few duplicates by eye.

c71a1a01d6fa1afe54a266b16e0a6880fb9ff7f837bb8f4e179ff7f59c5d5afe  ./03/05/20200305-210444/DSC_0359.JPG
c71a1a01d6fa1afe54a266b16e0a6880fb9ff7f837bb8f4e179ff7f59c5d5afe  ./03/22/20200322-164622/DSC_0359.JPG

A bit of magic over that hash file

$ cut -d ' ' -f 1 hash.txt | sort | uniq | wc  -l
336

Look at that - 336 is a familiar number - that's how many unique files I have.. it doesn't explain the 1 that is in review.. but we're really close

336 + 89 = 425

Ok - so the 425 originals is validated.

There is only 1 mystery - but I can dig into this myself. How do I have 336 in search, with 1 review - when my source appears to have 336 unique photos? Feels like an off by 1 error. In any case, the file in review has a funky (bad) date stamp in the EXIF..

I'll leave this open in case you want to tie the FAQ update to this issue. Otherwise please feel free to close.

@andrewlow commented on GitHub (Jan 26, 2021): Recap: total source files: ``` $ find . -type f | wc -l 371 ``` BTW - I do not have any stacked photos, but see how this may shift counts. Ahh.. Ok - so I have a number of .NEF files - 89 to be exact in this tree ``` $ ls -lR | grep --count \.NEF$ 89 ``` So I should expect that I will get some duplicates. Looking at other file types in that tree 12 .jpg 269 .JPG 1 .jpeg 89 + 12 + 269 + 1 = 371. Examining duplicates by using sha256 on the files - I can see a few duplicates by eye. ``` c71a1a01d6fa1afe54a266b16e0a6880fb9ff7f837bb8f4e179ff7f59c5d5afe ./03/05/20200305-210444/DSC_0359.JPG c71a1a01d6fa1afe54a266b16e0a6880fb9ff7f837bb8f4e179ff7f59c5d5afe ./03/22/20200322-164622/DSC_0359.JPG ``` A bit of magic over that hash file ``` $ cut -d ' ' -f 1 hash.txt | sort | uniq | wc -l 336 ``` Look at that - 336 is a familiar number - that's how many unique files I have.. it doesn't explain the 1 that is in review.. but we're really close **336 + 89 = 425** Ok - so the 425 originals is validated. There is only 1 mystery - but I can dig into this myself. How do I have 336 in search, with 1 review - when my source appears to have 336 unique photos? Feels like an off by 1 error. In any case, the file in review has a funky (bad) date stamp in the EXIF.. I'll leave this open in case you want to tie the FAQ update to this issue. Otherwise please feel free to close.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/photoprism#747
No description provided.