Faces: Improve performance and strategy for manual tagging #1679

Closed
opened 2026-02-20 00:18:58 -05:00 by deekerman · 34 comments
Owner

Originally created by @maxime1992 on GitHub (Jan 17, 2023).

Originally assigned to: @theshadow27, @graciousgrey, @omerdduran on GitHub.

Let me start by saying this is a new issue that'll summarize a closed, but still happening issue. I'm creating this new issue based on @lastzero 's comment.

1. What is not working as documented?

For faces that were not added automatically, tagging people manually is slow to extremely slow. With some photos or persons (unsure), it takes a few seconds only (rare) but once that issue happens, most of the time it takes in my case between 1 and 2 minutes to tag 1 person. I've decided recently to move all my photos to photoprism so I've got quite a lot already, and quite a lot more to come. I have more than a couple hundreds pictures with unrecognized faces waiting to be tagged, but I'm not able to wait 1 to 2 mn for each, it'd be unmanageable.

It's documented as a known issue. I have tried the command docker compose exec photoprism photoprism faces audit --fix but it didn't help fixing the issue.

2. How can we reproduce it?

I have no idea how exactly to reproduce this, but both @pjft and myself have shared parts of our database by email directly to help debug it. See pjft comment and mine. Best case scenario it's enough to debug, if not I'm more than happy to provide more info.

3. What behavior do you expect?

When a face is detected but unknown, it shouldn't take more than a few seconds to add it manually.

4. What could be the cause of your problem?

Really unsure, sorry.

5. Can you provide us with example files for testing, error logs, or screenshots?

As stated above, this has been done by email as it includes personal data.

6. Which software versions do you use?

(a) PhotoPrism Architecture & Build Number: AMD64, ARM64, ARMv7,...

Build 221118-e58fee0fb

(b) Database Type & Version: MariaDB, MySQL, SQLite,...

MariaDB

(c) Operating System Types & Versions: Linux, Windows, Android,...

Linux

(d) Browser Types & Versions: Firefox, Chrome, Safari on iPhone,...

Brave and Chrome

(e) Ad Blockers, Browser Plugins, and/or Firewall Software?

Probably irrelevant here. (I've tried to turn them off, doesn't change anything).

7. On what kind of device is PhotoPrism installed?

This is especially important if you are reporting a performance, import, or indexing issue. You can skip this if you're reporting a problem you found in our public demo, or if it's a completely unrelated issue, such as incorrect page layout.

(a) Device / Processor Type: Raspberry Pi 4, Intel Core i7-3770, AMD Ryzen 7 3800X,...

Intel(R) Core(TM) i3-3227U CPU @ 1.90GHz

(b) Physical Memory & Swap Space in GB

4gb and 4gb

(c) Storage Type: HDD, SSD, RAID, USB, Network Storage,...

SSD

(d) Anything else that might be helpful to know?

8. Do you use a Reverse Proxy, Firewall, VPN, or CDN?

Yes, I use SWAG which itself uses NGINX, but it's probably unrelated here as it used to work perfectly with that config.


I'll summarize what was found in the previous issue:

  • There's a suspicion that this issue may arise when photoprism thinks there are different faces, assigned to the same person
  • Make a "faces reset" works... But in the case of reporter of the previous issue it came back so not really a viable option
  • It was asked if it could come from either running multiple instances or directly modifying database content. It's not the case for neither of these for me
  • The original issue was closed because nobody shared their database, we've now send 2 of them for debugging
Originally created by @maxime1992 on GitHub (Jan 17, 2023). Originally assigned to: @theshadow27, @graciousgrey, @omerdduran on GitHub. Let me start by saying this is a new issue that'll summarize [a closed, but still happening issue](https://github.com/photoprism/photoprism/issues/2806). I'm creating this new issue based on @lastzero 's [comment](https://github.com/photoprism/photoprism/issues/2806#issuecomment-1383165029). #### 1. What is not working as documented? For faces that were not added automatically, tagging people manually is slow to extremely slow. With some photos or persons (unsure), it takes a few seconds only (rare) but once that issue happens, most of the time it takes in my case between 1 and 2 minutes to tag 1 person. I've decided recently to move all my photos to photoprism so I've got quite a lot already, and quite a lot more to come. I have more than a couple hundreds pictures with unrecognized faces waiting to be tagged, but I'm not able to wait 1 to 2 mn for each, it'd be unmanageable. It's documented as a [known issue](https://docs.photoprism.app/known-issues/#removing-merged-clusters-fails). I have tried the command `docker compose exec photoprism photoprism faces audit --fix` but it didn't help fixing the issue. #### 2. How can we reproduce it? I have no idea how exactly to reproduce this, but both @pjft and myself have shared parts of our database by email directly to help debug it. See [pjft comment](https://github.com/photoprism/photoprism/issues/2806#issuecomment-1382955746) and [mine](https://github.com/photoprism/photoprism/issues/2806#issuecomment-1383100945). Best case scenario it's enough to debug, if not I'm more than happy to provide more info. #### 3. What behavior do you expect? When a face is detected but unknown, it shouldn't take more than a few seconds to add it manually. #### 4. What could be the cause of your problem? Really unsure, sorry. #### 5. Can you provide us with example files for testing, error logs, or screenshots? As stated above, this has been done by email as it includes personal data. #### 6. Which software versions do you use? (a) PhotoPrism Architecture & Build Number: AMD64, ARM64, ARMv7,... `Build 221118-e58fee0fb` (b) Database Type & Version: MariaDB, MySQL, SQLite,... `MariaDB` (c) Operating System Types & Versions: Linux, Windows, Android,... `Linux` (d) Browser Types & Versions: Firefox, Chrome, Safari on iPhone,... `Brave` and `Chrome` (e) Ad Blockers, Browser Plugins, and/or Firewall Software? Probably irrelevant here. (I've tried to turn them off, doesn't change anything). #### 7. On what kind of device is PhotoPrism installed? This is especially important if you are reporting a performance, import, or indexing issue. You can skip this if you're reporting a problem you found in our public demo, or if it's a completely unrelated issue, such as incorrect page layout. (a) Device / Processor Type: Raspberry Pi 4, Intel Core i7-3770, AMD Ryzen 7 3800X,... `Intel(R) Core(TM) i3-3227U CPU @ 1.90GHz` (b) Physical Memory & Swap Space in GB 4gb and 4gb (c) Storage Type: HDD, SSD, RAID, USB, Network Storage,... SSD (d) Anything else that might be helpful to know? - #### 8. Do you use a Reverse Proxy, Firewall, VPN, or CDN? Yes, I use SWAG which itself uses NGINX, but it's probably unrelated here as it used to work perfectly with that config. -------------------------------- I'll summarize what was found in the previous issue: - There's a [suspicion](https://github.com/photoprism/photoprism/issues/2806#issuecomment-1310593059) that this issue may arise when photoprism thinks there are different faces, assigned to the same person - Make a "faces reset" works... But in the case of reporter of the previous issue it came back so not really a viable option - It was [asked](https://github.com/photoprism/photoprism/issues/2806#issuecomment-1313610652) if it could come from either running multiple instances or directly modifying database content. It's not the case for neither of these for me - The original issue was closed because nobody shared their database, we've now send 2 of them for debugging
Author
Owner

@mdmarwil commented on GitHub (Jan 30, 2023):

I have am having this same issue and the command docker compose exec photoprism photoprism faces audit --fix does not help.

My background is mostly in python, but I tried to do some digging through the code.

The Warning message faces: failed removing merged clusters for subject comes from the following code block in internal/query/faces.go:

// PurgeOrphanFaces removes unused faces from the index.
if removed, err := PurgeOrphanFaces(merge.IDs()); err != nil {
return merged, err
} else if removed > 0 {
log.Debugf("faces: removed %d orphans for subject %s", removed, clean.Log(subjUID))
} else {
log.Warnf("faces: failed removing merged clusters for subject %s", clean.Log(subjUID))
}

Just prior to this block of code MatchMarkers() is called. It is trying to match 1 Face to 4 other Faces with the same subj_uid, but it looks like the distance of the embeddings of the 4 others Faces is too far from the first Face to be considered a match. So none complete.

Going back to the block of code shown above that means:

  • No error was encountered
  • The number removed is 0, since no faces were reassigned and thus orphaned

So it falls into the else block with a warning message, although for this case the more accurate message would be to state there were no orphaned cluster to remove.

@mdmarwil commented on GitHub (Jan 30, 2023): I have am having this same issue and the command `docker compose exec photoprism photoprism faces audit --fix` does not help. My background is mostly in python, but I tried to do some digging through the code. The Warning message `faces: failed removing merged clusters for subject` comes from the following code block in internal/query/faces.go: > // PurgeOrphanFaces removes unused faces from the index. > if removed, err := PurgeOrphanFaces(merge.IDs()); err != nil { > return merged, err > } else if removed > 0 { > log.Debugf("faces: removed %d orphans for subject %s", removed, clean.Log(subjUID)) > } else { > log.Warnf("faces: failed removing merged clusters for subject %s", clean.Log(subjUID)) > } Just prior to this block of code MatchMarkers() is called. It is trying to match 1 Face to 4 other Faces with the same subj_uid, but it looks like the distance of the embeddings of the 4 others Faces is too far from the first Face to be considered a match. So none complete. Going back to the block of code shown above that means: - No error was encountered - The number removed is 0, since no faces were reassigned and thus orphaned So it falls into the else block with a warning message, although for this case the more accurate message would be to state there were no orphaned cluster to remove.
Author
Owner

@dalfry commented on GitHub (Feb 5, 2023):

Running into this issue. Tried running the audit fix command with no change. I have the merge failed error for one single subject. Library has 100k+ files. Spent a week tagging faces. Would prefer not to reset faces. :-)

BTW, Thanks for all your work! Photoprism is awesome! and I am a sponsor as well.

TRAC[2023-02-05T19:31:33Z] config: defaults file /etc/photoprism/defaults.yml does not exist
INFO[2023-02-05T19:31:33Z] config: case-insensitive file system detected
DEBU[2023-02-05T19:31:33Z] config: running on 'Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz', 4.1 GB memory detected
DEBU[2023-02-05T19:31:33Z] settings: loaded from /photoprism/storage/config/settings.yml
DEBU[2023-02-05T19:31:33Z] config: successfully initialized [35.329727ms]
INFO[2023-02-05T19:31:33Z] migrate: executing pre migrations
DEBU[2023-02-05T19:31:33Z] migrate: found 2 previous migrations
DEBU[2023-02-05T19:31:33Z] migrate: 20221015-100000 skipped
DEBU[2023-02-05T19:31:33Z] migrate: 20221015-100100 skipped
INFO[2023-02-05T19:31:34Z] migrate: executing main migrations
DEBU[2023-02-05T19:31:34Z] migrate: found 21 previous migrations
DEBU[2023-02-05T19:31:34Z] migrate: 20211121-094727 skipped
DEBU[2023-02-05T19:31:34Z] migrate: 20211124-120008 skipped
DEBU[2023-02-05T19:31:34Z] migrate: 20220329-030000 skipped
DEBU[2023-02-05T19:31:34Z] migrate: 20220329-040000 skipped
DEBU[2023-02-05T19:31:34Z] migrate: 20220329-050000 skipped
DEBU[2023-02-05T19:31:34Z] migrate: 20220329-060000 skipped
DEBU[2023-02-05T19:31:34Z] migrate: 20220329-061000 skipped
DEBU[2023-02-05T19:31:34Z] migrate: 20220329-070000 skipped
DEBU[2023-02-05T19:31:34Z] migrate: 20220329-071000 skipped
DEBU[2023-02-05T19:31:34Z] migrate: 20220329-080000 skipped
DEBU[2023-02-05T19:31:34Z] migrate: 20220329-081000 skipped
DEBU[2023-02-05T19:31:34Z] migrate: 20220329-083000 skipped
DEBU[2023-02-05T19:31:34Z] migrate: 20220329-090000 skipped
DEBU[2023-02-05T19:31:34Z] migrate: 20220329-091000 skipped
DEBU[2023-02-05T19:31:34Z] migrate: 20220329-093000 skipped
DEBU[2023-02-05T19:31:34Z] migrate: 20220421-200000 skipped
DEBU[2023-02-05T19:31:34Z] migrate: 20220521-000001 skipped
DEBU[2023-02-05T19:31:34Z] migrate: 20220521-000002 skipped
DEBU[2023-02-05T19:31:34Z] migrate: 20220521-000003 skipped
DEBU[2023-02-05T19:31:34Z] migrate: 20220927-000100 skipped
DEBU[2023-02-05T19:31:34Z] migrate: 20221002-000100 skipped
TRAC[2023-02-05T19:31:34Z] migrate: migrations migrated
TRAC[2023-02-05T19:31:34Z] migrate: auth_sessions migrated
TRAC[2023-02-05T19:31:34Z] migrate: countries migrated
TRAC[2023-02-05T19:31:34Z] migrate: labels migrated
TRAC[2023-02-05T19:31:35Z] migrate: photos_labels migrated
TRAC[2023-02-05T19:31:35Z] migrate: reactions migrated
TRAC[2023-02-05T19:31:35Z] migrate: files_share migrated
TRAC[2023-02-05T19:31:35Z] migrate: files_sync migrated
TRAC[2023-02-05T19:31:35Z] migrate: details migrated
TRAC[2023-02-05T19:31:35Z] migrate: photos_albums migrated
TRAC[2023-02-05T19:31:35Z] migrate: auth_users_shares migrated
TRAC[2023-02-05T19:31:35Z] migrate: places migrated
TRAC[2023-02-05T19:31:35Z] migrate: lenses migrated
TRAC[2023-02-05T19:31:35Z] migrate: keywords migrated
TRAC[2023-02-05T19:31:35Z] migrate: auth_users migrated
TRAC[2023-02-05T19:31:35Z] migrate: folders migrated
TRAC[2023-02-05T19:31:36Z] migrate: duplicates migrated
TRAC[2023-02-05T19:31:36Z] migrate: photos_users migrated
TRAC[2023-02-05T19:31:36Z] migrate: albums migrated
TRAC[2023-02-05T19:31:36Z] migrate: albums_users migrated
TRAC[2023-02-05T19:31:40Z] migrate: photos_keywords migrated
TRAC[2023-02-05T19:31:40Z] migrate: links migrated
TRAC[2023-02-05T19:31:40Z] migrate: subjects migrated
TRAC[2023-02-05T19:31:40Z] migrate: markers migrated
TRAC[2023-02-05T19:31:40Z] migrate: auth_users_settings migrated
TRAC[2023-02-05T19:31:40Z] migrate: services migrated
TRAC[2023-02-05T19:31:40Z] migrate: errors migrated
TRAC[2023-02-05T19:31:40Z] migrate: passwords migrated
TRAC[2023-02-05T19:31:40Z] migrate: cells migrated
TRAC[2023-02-05T19:31:40Z] migrate: cameras migrated
TRAC[2023-02-05T19:31:40Z] migrate: categories migrated
TRAC[2023-02-05T19:31:40Z] migrate: faces migrated
TRAC[2023-02-05T19:31:40Z] migrate: auth_users_details migrated
TRAC[2023-02-05T19:31:41Z] migrate: files migrated
TRAC[2023-02-05T19:31:41Z] migrate: photos migrated
DEBU[2023-02-05T19:31:41Z] migrate: completed in 7.796722162s
INFO[2023-02-05T19:31:41Z] faces: found 104 subjects
INFO[2023-02-05T19:31:41Z] faces: found no invalid marker subjects
INFO[2023-02-05T19:31:41Z] faces: found no invalid marker faces
INFO[2023-02-05T19:32:10Z] faces: found no ambiguous subjects
INFO[2023-02-05T19:32:10Z] faces: found no orphan face clusters
INFO[2023-02-05T19:32:10Z] faces: found no orphan people
INFO[2023-02-05T19:32:10Z] completed in 36.75042936s
INFO[2023-02-05T19:32:10Z] closed database connection

@dalfry commented on GitHub (Feb 5, 2023): Running into this issue. Tried running the audit fix command with no change. I have the merge failed error for one single subject. Library has 100k+ files. Spent a week tagging faces. Would prefer not to reset faces. :-) BTW, Thanks for all your work! Photoprism is awesome! and I am a sponsor as well. TRAC[2023-02-05T19:31:33Z] config: defaults file /etc/photoprism/defaults.yml does not exist INFO[2023-02-05T19:31:33Z] config: case-insensitive file system detected DEBU[2023-02-05T19:31:33Z] config: running on 'Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz', 4.1 GB memory detected DEBU[2023-02-05T19:31:33Z] settings: loaded from /photoprism/storage/config/settings.yml DEBU[2023-02-05T19:31:33Z] config: successfully initialized [35.329727ms] INFO[2023-02-05T19:31:33Z] migrate: executing pre migrations DEBU[2023-02-05T19:31:33Z] migrate: found 2 previous migrations DEBU[2023-02-05T19:31:33Z] migrate: 20221015-100000 skipped DEBU[2023-02-05T19:31:33Z] migrate: 20221015-100100 skipped INFO[2023-02-05T19:31:34Z] migrate: executing main migrations DEBU[2023-02-05T19:31:34Z] migrate: found 21 previous migrations DEBU[2023-02-05T19:31:34Z] migrate: 20211121-094727 skipped DEBU[2023-02-05T19:31:34Z] migrate: 20211124-120008 skipped DEBU[2023-02-05T19:31:34Z] migrate: 20220329-030000 skipped DEBU[2023-02-05T19:31:34Z] migrate: 20220329-040000 skipped DEBU[2023-02-05T19:31:34Z] migrate: 20220329-050000 skipped DEBU[2023-02-05T19:31:34Z] migrate: 20220329-060000 skipped DEBU[2023-02-05T19:31:34Z] migrate: 20220329-061000 skipped DEBU[2023-02-05T19:31:34Z] migrate: 20220329-070000 skipped DEBU[2023-02-05T19:31:34Z] migrate: 20220329-071000 skipped DEBU[2023-02-05T19:31:34Z] migrate: 20220329-080000 skipped DEBU[2023-02-05T19:31:34Z] migrate: 20220329-081000 skipped DEBU[2023-02-05T19:31:34Z] migrate: 20220329-083000 skipped DEBU[2023-02-05T19:31:34Z] migrate: 20220329-090000 skipped DEBU[2023-02-05T19:31:34Z] migrate: 20220329-091000 skipped DEBU[2023-02-05T19:31:34Z] migrate: 20220329-093000 skipped DEBU[2023-02-05T19:31:34Z] migrate: 20220421-200000 skipped DEBU[2023-02-05T19:31:34Z] migrate: 20220521-000001 skipped DEBU[2023-02-05T19:31:34Z] migrate: 20220521-000002 skipped DEBU[2023-02-05T19:31:34Z] migrate: 20220521-000003 skipped DEBU[2023-02-05T19:31:34Z] migrate: 20220927-000100 skipped DEBU[2023-02-05T19:31:34Z] migrate: 20221002-000100 skipped TRAC[2023-02-05T19:31:34Z] migrate: migrations migrated TRAC[2023-02-05T19:31:34Z] migrate: auth_sessions migrated TRAC[2023-02-05T19:31:34Z] migrate: countries migrated TRAC[2023-02-05T19:31:34Z] migrate: labels migrated TRAC[2023-02-05T19:31:35Z] migrate: photos_labels migrated TRAC[2023-02-05T19:31:35Z] migrate: reactions migrated TRAC[2023-02-05T19:31:35Z] migrate: files_share migrated TRAC[2023-02-05T19:31:35Z] migrate: files_sync migrated TRAC[2023-02-05T19:31:35Z] migrate: details migrated TRAC[2023-02-05T19:31:35Z] migrate: photos_albums migrated TRAC[2023-02-05T19:31:35Z] migrate: auth_users_shares migrated TRAC[2023-02-05T19:31:35Z] migrate: places migrated TRAC[2023-02-05T19:31:35Z] migrate: lenses migrated TRAC[2023-02-05T19:31:35Z] migrate: keywords migrated TRAC[2023-02-05T19:31:35Z] migrate: auth_users migrated TRAC[2023-02-05T19:31:35Z] migrate: folders migrated TRAC[2023-02-05T19:31:36Z] migrate: duplicates migrated TRAC[2023-02-05T19:31:36Z] migrate: photos_users migrated TRAC[2023-02-05T19:31:36Z] migrate: albums migrated TRAC[2023-02-05T19:31:36Z] migrate: albums_users migrated TRAC[2023-02-05T19:31:40Z] migrate: photos_keywords migrated TRAC[2023-02-05T19:31:40Z] migrate: links migrated TRAC[2023-02-05T19:31:40Z] migrate: subjects migrated TRAC[2023-02-05T19:31:40Z] migrate: markers migrated TRAC[2023-02-05T19:31:40Z] migrate: auth_users_settings migrated TRAC[2023-02-05T19:31:40Z] migrate: services migrated TRAC[2023-02-05T19:31:40Z] migrate: errors migrated TRAC[2023-02-05T19:31:40Z] migrate: passwords migrated TRAC[2023-02-05T19:31:40Z] migrate: cells migrated TRAC[2023-02-05T19:31:40Z] migrate: cameras migrated TRAC[2023-02-05T19:31:40Z] migrate: categories migrated TRAC[2023-02-05T19:31:40Z] migrate: faces migrated TRAC[2023-02-05T19:31:40Z] migrate: auth_users_details migrated TRAC[2023-02-05T19:31:41Z] migrate: files migrated TRAC[2023-02-05T19:31:41Z] migrate: photos migrated DEBU[2023-02-05T19:31:41Z] migrate: completed in 7.796722162s INFO[2023-02-05T19:31:41Z] faces: found 104 subjects INFO[2023-02-05T19:31:41Z] faces: found no invalid marker subjects INFO[2023-02-05T19:31:41Z] faces: found no invalid marker faces INFO[2023-02-05T19:32:10Z] faces: found no ambiguous subjects INFO[2023-02-05T19:32:10Z] faces: found no orphan face clusters INFO[2023-02-05T19:32:10Z] faces: found no orphan people INFO[2023-02-05T19:32:10Z] completed in 36.75042936s INFO[2023-02-05T19:32:10Z] closed database connection
Author
Owner

@lastzero commented on GitHub (Feb 23, 2023):

I now have a clearer idea of the underlying problem and think we can improve the situation by not matching manually labeled faces if they do not meet the constraints set for automatic clustering (more specifically, cluster core and core distance).

When testing with the pictures we had been using during development, a more aggressive approach for manually labeled faces seemed to work well, but this is only feasible for faces that the model can categorize well. For example, all these faces were matched after labeling a single face in the edit dialog:

single-w

@lastzero commented on GitHub (Feb 23, 2023): I now have a clearer idea of the underlying problem and think we can improve the situation by not matching manually labeled faces if they do not meet the constraints set for automatic clustering (more specifically, cluster core and core distance). When testing with the pictures we had been using during development, a more aggressive approach for manually labeled faces seemed to work well, but this is only feasible for faces that the model can categorize well. For example, all these faces were matched after labeling a single face in the edit dialog: ![single-w](https://user-images.githubusercontent.com/301686/221028034-ca405fd0-8598-46be-a75a-fd5d12c9525e.jpg)
Author
Owner

@pjft commented on GitHub (Feb 24, 2023):

Thank you for the update - this looks promising! What consequences would this change bring about? Meaning, what will change by not matching manually labeled faces if they do not meet the constraints set for automatic clustering?

Just curious to understand the "why the original decision, what problems did it solve" and "what consequences does it have".

Not urgent, by any means - just intellectual curiosity. :)

Thank you, and happy Friday!

@pjft commented on GitHub (Feb 24, 2023): Thank you for the update - this looks promising! What consequences would this change bring about? Meaning, what will change by not matching manually labeled faces if they do not meet the constraints set for automatic clustering? Just curious to understand the "why the original decision, what problems did it solve" and "what consequences does it have". Not urgent, by any means - just intellectual curiosity. :) Thank you, and happy Friday!
Author
Owner

@lastzero commented on GitHub (Feb 24, 2023):

@pjft We can't tell what the final effect will be until it's implemented. It is expected that you will have to tag a few more faces that belong to the same person before the automatic matching starts, so basically how most other apps work.

Since these changes require extensive testing, we will probably wait until after the next stable release to avoid delaying it even further.

The sample faces we use for testing can be downloaded here:

@lastzero commented on GitHub (Feb 24, 2023): @pjft We can't tell what the final effect will be until it's implemented. It is expected that you will have to tag a few more faces that belong to the same person before the automatic matching starts, so basically how most other apps work. Since these changes require extensive testing, we will probably wait until after the next stable release to avoid delaying it even further. The sample faces we use for testing can be downloaded here: - https://afad-dataset.github.io/ - https://github.com/afad-dataset/tarball-lite
Author
Owner

@pjft commented on GitHub (Feb 24, 2023):

Of course - appreciate the time and effort in looking into this.

Let us know how we can help.

@pjft commented on GitHub (Feb 24, 2023): Of course - appreciate the time and effort in looking into this. Let us know how we can help.
Author
Owner

@pjft commented on GitHub (May 19, 2023):

Just wanted to say that this seems to be solved in the latest builds!

Awesome work :) It's a breeze to label hundreds of clusters now.

@pjft commented on GitHub (May 19, 2023): Just wanted to say that this seems to be solved in the latest builds! Awesome work :) It's a breeze to label hundreds of clusters now.
Author
Owner

@lastzero commented on GitHub (May 19, 2023):

@pjft Thank you! That's good to know, since the feedback we've received on the optimizations hasn't been very enthusiastic so far. Most users didn't seem to notice (or care about) the changes when they tested our development preview builds.

@lastzero commented on GitHub (May 19, 2023): @pjft Thank you! That's good to know, since the feedback we've received on the optimizations hasn't been very enthusiastic so far. Most users didn't seem to notice (or care about) the changes when they tested our development preview builds.
Author
Owner

@pjft commented on GitHub (May 19, 2023):

I have no issues whatsoever, and I recall having to wait 1 min or so before having to move forward after a while.

Only comment I have is that, for some reason, the "audit fix" and "optimize faces" commands seem to cause more long-lasting issues than the ones I experience without them. Unsure if temporary or not, but I have not dug into them as it's painful to go back and redo all the work 😅

@pjft commented on GitHub (May 19, 2023): I have no issues whatsoever, and I recall having to wait 1 min or so before having to move forward after a while. Only comment I have is that, for some reason, the "audit fix" and "optimize faces" commands seem to cause more long-lasting issues than the ones I experience without them. Unsure if temporary or not, but I have not dug into them as it's painful to go back and redo all the work 😅
Author
Owner

@lastzero commented on GitHub (May 19, 2023):

We will change how manually labeled faces are clustered and compared in one of the upcoming releases. This should solve the remaining issues.

@lastzero commented on GitHub (May 19, 2023): We will change how manually labeled faces are clustered and compared in one of the upcoming releases. This should solve the remaining issues.
Author
Owner

@pjft commented on GitHub (May 19, 2023):

Awesome:) if you need any testing at any point, by all means tag me directly. I'm always happy to help if I can. I appreciate all the work you and @graciousgrey put into making Photoprism great! Thank you so much.

@pjft commented on GitHub (May 19, 2023): Awesome:) if you need any testing at any point, by all means tag me directly. I'm always happy to help if I can. I appreciate all the work you and @graciousgrey put into making Photoprism great! Thank you so much.
Author
Owner

@berturu commented on GitHub (Jul 24, 2023):

Unfortunately I still experience a delay when tagging new faces. The delay ranges from 2/3 seconds to possibly a full minute at times. Running Build 1.2306.25

@berturu commented on GitHub (Jul 24, 2023): Unfortunately I still experience a delay when tagging new faces. The delay ranges from 2/3 seconds to possibly a full minute at times. Running Build 1.2306.25
Author
Owner

@lastzero commented on GitHub (Jul 24, 2023):

Unfortunately, we haven't had time to work on this yet. We will reference this issue in the related commits and mention it in the release notes once the improvements are available. Alternatively, feel free to work on this and provide us with a pull request that we just need to merge. Thank you! ❤️

@lastzero commented on GitHub (Jul 24, 2023): Unfortunately, we haven't had time to work on this yet. We will reference this issue in the related commits and mention it in the release notes once the improvements are available. Alternatively, feel free to work on this and provide us with a pull request that we just need to merge. Thank you! ❤️
Author
Owner

@judej commented on GitHub (Aug 30, 2023):

+1 on the perf issue. When I manually tag faces, it can take about 20 seconds before I can tag another one. If I try tagging it give me a "busy, try again later" error.

I have found that if I navigate in the reverse order when tagging, sometimes, the tagging is faster - sub second. I get to tag about 4 or so pics and then it slows down again.

not sure if this helps...

@judej commented on GitHub (Aug 30, 2023): +1 on the perf issue. When I manually tag faces, it can take about 20 seconds before I can tag another one. If I try tagging it give me a "busy, try again later" error. I have found that if I navigate in the reverse order when tagging, sometimes, the tagging is faster - sub second. I get to tag about 4 or so pics and then it slows down again. not sure if this helps...
Author
Owner

@pabsi commented on GitHub (Oct 2, 2023):

Hi there @lastzero , the Photoprism team, and its awesome community,
I was about to raise a new issue, but I think this one fits my usecase.

I have a huge amount of photos for which I manually have to tag faces to. It seems that Tensorflow does not play nicely with children growing from the baby stage, into toddlers, then kids, then teenagers. Obvious face change, but it does require more manual intervention from me. Also, going to public paces (a zoo, a museum, a restaurant, etc) catches lots of unwanted people.

I often found myself having to edit many faces per photo, and find this process extremely tedious and slow. I think, in my humble opinion, and ideally, we shouold be able to do all changes in one photo (add, remove, modify face tags) and when leaving the photo, commit the changes to the DB. Again, in my humble opinion, the way it works now (make change, wait for it to be committed) it's very very slow and underperforming.

Unsure if this GitHub issue is for the same purpose, but thought I'd mention it here first, before raising a new issue unnecessarily.

Thank you so much for this amazing piece of software 🙏

@pabsi commented on GitHub (Oct 2, 2023): Hi there @lastzero , the Photoprism team, and its awesome community, I was about to raise a new issue, but I think this one fits my usecase. I have a huge amount of photos for which I manually have to tag faces to. It seems that Tensorflow does not play nicely with children growing from the baby stage, into toddlers, then kids, then teenagers. Obvious face change, but it does require more manual intervention from me. Also, going to public paces (a zoo, a museum, a restaurant, etc) catches lots of unwanted people. I often found myself having to edit many faces per photo, and find this process extremely tedious and slow. I think, in my humble opinion, and ideally, we shouold be able to do all changes in one photo (add, remove, modify face tags) and when leaving the photo, commit the changes to the DB. Again, in my humble opinion, the way it works now (make change, wait for it to be committed) it's very very slow and underperforming. Unsure if this GitHub issue is for the same purpose, but thought I'd mention it here first, before raising a new issue unnecessarily. Thank you so much for this amazing piece of software :pray:
Author
Owner

@richardpickett commented on GitHub (Oct 6, 2023):

I'm having this same issue. 26,000+ photos, mostly Asian family. I'm happy to provide as much info as possible, if it's helpful in resolving this issue.

running photoprism/photoprism:ubuntu on a k8s w/ mariadb:10.11

version 230923-e59851350-Linux-AMD64-Plus

Sometimes manually tagging will take 6+ minutes (update: 8+ minutes), with usual wait times around 2 minutes. During the wait I can see 20+ messages like this:

time="2023-10-06T15:53:24Z" level=warning msg="faces: failed removing merged clusters for subject js1xgms3dgksr08g"

Each with a different subject name.

I've run:
photoprism migrations run -f - no output, completes in less than a minute

photoprism index --cleanup - (a) warnings on mp4s that are too big, (b) errors on jpgs making thumbnails (unexpected EOF while decoding), and (c) a lot of failed removing merged clusters for subject <****>

I'm adept on mysql, k8s, bash, etc, so I'm completely comfortable with low-level debugging to help resolve this - just warn me before we do something that may potentially be destructive (also, would like to know where the db backup saves so I can copy it off as needed).

@richardpickett commented on GitHub (Oct 6, 2023): I'm having this same issue. 26,000+ photos, mostly Asian family. I'm happy to provide as much info as possible, if it's helpful in resolving this issue. running `photoprism/photoprism:ubuntu` on a k8s w/ `mariadb:10.11` version `230923-e59851350-Linux-AMD64-Plus` Sometimes manually tagging will take **6+ minutes (update: 8+ minutes)**, with usual wait times around 2 minutes. During the wait I can see 20+ messages like this: `time="2023-10-06T15:53:24Z" level=warning msg="faces: failed removing merged clusters for subject js1xgms3dgksr08g"` Each with a different subject name. I've run: `photoprism migrations run -f` - no output, completes in less than a minute `photoprism index --cleanup` - (a) warnings on mp4s that are too big, (b) errors on jpgs making thumbnails (unexpected EOF while decoding), and (c) a lot of failed removing merged clusters for subject <****> I'm adept on mysql, k8s, bash, etc, so I'm completely comfortable with low-level debugging to help resolve this - just warn me before we do something that may potentially be destructive (also, would like to know where the db backup saves so I can copy it off as needed).
Author
Owner

@stephanvierkant commented on GitHub (Jan 18, 2024):

I'm having the same issue. It takes about 30-60 seconds to add faces. While it would be great to improve the performance, could it be possible to have it done by a background process? I think it shouldn't block the front end.

@stephanvierkant commented on GitHub (Jan 18, 2024): I'm having the same issue. It takes about 30-60 seconds to add faces. While it would be great to improve the performance, could it be possible to have it done by a background process? I think it shouldn't block the front end.
Author
Owner

@graciousgrey commented on GitHub (Jan 18, 2024):

We will take care of this as soon as we have released the new authentication options we are currently working on :)

@graciousgrey commented on GitHub (Jan 18, 2024): We will take care of this as soon as we have released the new authentication options we are currently working on :)
Author
Owner

@trop1kal commented on GitHub (Jan 28, 2024):

Awesome! I notice the delay when tagging manually tagging faces as well.
Takes upwards of 2-3mins in most cases.

You guys rock! TY in advance!

@trop1kal commented on GitHub (Jan 28, 2024): Awesome! I notice the delay when tagging manually tagging faces as well. Takes upwards of 2-3mins in most cases. You guys rock! TY in advance!
Author
Owner

@niclan commented on GitHub (Mar 13, 2024):

Same, takes minutes with CPU usage on the server ~30%+ and then peaking at 100% right before finishing. And sometimes it takes just seconds.

It seems like the server is blocking while this is going on.

A bit under 150K images.

@niclan commented on GitHub (Mar 13, 2024): Same, takes minutes with CPU usage on the server ~30%+ and then peaking at 100% right before finishing. And sometimes it takes just seconds. It seems like the server is blocking while this is going on. A bit under 150K images.
Author
Owner

@nodecentral commented on GitHub (May 20, 2024):

Just wanted to add to this, and share how inconsistent it can be when manually tagging (or de-tagging) faces. As such a key part of the product for me (and to help my aging parents remember people from their past) - anything that can be done to speed things up would be amazing..

The face tagging feature is the most important feature for a number of people - thank you so much for making it available

@nodecentral commented on GitHub (May 20, 2024): Just wanted to add to this, and share how inconsistent it can be when manually tagging (or de-tagging) faces. As such a key part of the product for me (and to help my aging parents remember people from their past) - anything that can be done to speed things up would be amazing.. The face tagging feature is the most important feature for a number of people - thank you so much for making it available
Author
Owner

@theshadow27 commented on GitHub (Dec 20, 2024):

This has been bugging me for a while, but it's now taking 200-300 seconds per face. That's enough time to start looking through the code.

Background

The call that starts the blocking operation, and subsequently fails resulting in 429 is PUT /api/v1/markers/:marker_uid (controller is internal/api/markers:182 func UpdateMarker(router *gin.RouterGroup) . The 429 error is returned on line 191 when the mutex.UpdatePeople.Start() is already locked. The next line uses defer mutex.UpdatePeople.Stop() to lock the mutex until the UpdateMarker function returns.

The issue is that the remainder of the function can take a VERY long time to return. Upwards of 3 minutes. The application exhibits both CPU utilization (100% of 1 core) and DB traffic (distinct spikes of DB traffic and queries). The best I can tell from the logs, when SaveForm reports changed (which will almost always be true when adding a name to a face that did not have one before) we do some stuff:

  1. get.Faces().Optimize() ( when marker.FaceID != "" && marker.SubjUID != "" && marker.SubjSrc == entity.SrcManual, which is again always true from what I can gather based on the front end, since we passed the marker and subject in with the call and the src is always 'manual').
  2. query.UpdateSubjectCovers(true) - seems like a single, well optimized query
  3. entity.UpdateSubjectCounts(true) - also a single, well optimized query
  4. query.PhotoByUID(file.PhotoUID) - also a single query
  5. p.UpdateAndSaveTitle(); - slightly more complicated but nothing alarming)

Based on log line timing, I suspect that the most time is being spent in (1), Faces().Optomize(), which is invoked on every single successful call to PUT markers.

faces_optimize.go : Optimize()

The Optomize() function is scary for something routinely invoked by a front-end client. First off, it starts with a loop to 11 (although there is an early abort, this is a lot of effort). Up to 11 times, until no progress is made we:

  1. Query EVERY manually added face from the DB: ManuallyAddedFaces(false, false), basically SELECT * FROM faces where face_hidden=false AND face_src="manual" AND subj_uid <> '' AND face_kind <= 1 ORDERBY subj_uid, samples DESC. Remember - up to 10 times! Why?!? we are not manually adding new faces; the mutex has locked us out of doing so.
  2. For EACH n manually added face from the DB (mine has n=322 rows):
  • merge all faces where the subject has the same UID and continue (fine)
  • run merge[0].match(faces[j].embedding()) (will come to this later). If that returns ok then print a log line and append the face to the merge
  • while the faces are the same, try to merge them in the DB using query.MergeFaces - more on this next
  1. Return the result, which has a single property Merged which is an int. So a count of the number merged.
  2. Print the count of merged faces log.Infof("faces: merged %s", english.Plural(res.Merged, "cluster", "clusters")) so that we know that Photoprism is ready to accept the next name (3 minutes later).

In summary, we loop up to 11*322 times, calling match up to len(faces)-len(subjects) (like 322-135 = 187 times for me) times and MergeFaces len(faces)-2*len(subjects) times (per the log lines, this seems to be about 121-122 times).

The 10 times is important - I am moderately confident that this loop never exits early (at least on my setup). See the DB traffic graph below for each name-adding operation:

image

There are 11 distinct spikes of "UPDATE" statements (peak = 220/second).

What happens in MergeFaces

Yikes. Remember this happens essentially 11 * number of subjects for each PUT API call:

  1. Allocate memory and serialized the embeddings (entity.NewFace)
  2. DB call to find the first face with the subject UID or create it if it does not exist FindFirstOrCreate
  3. Attempt to match markers from the IDs MatchMarkers which does another query for all markers with the faceIDs, running Match against each one - and potentially calling SetFace that then calls ResolveCollision that AGAIN matches all the embeddings and might write to the DB. In my case, this is rare, but it does happen. This means that most of the cycles are wasted.
  4. Purging orphan faces via PurgeOrphanFaces - this seems like an over-eager operation considering how many times it is run.

The end result is thousands of UPDATE operations on the DB, along with substantial CPU time spent in Match(), for each PUT API call.

Why Optimize Every Call?

TBH I have no clue. I put a lot of the source through GPT4o and asked, and it said:

Running the Optimize() function every time a face is tagged is likely an overly cautious approach intended to ensure consistency in the face clustering database. However, this is a highly inefficient design for the following reasons:

  1. Redundancy in Optimization: The optimization process (which appears to cluster faces and update subject-related data) does not need to be run for every face tagging event. Most likely, the state of the system doesn’t change drastically with a single new tag, especially for large datasets.
  2. Expensive Operations:
  • The clustering (Optimize()) is an iterative and computationally expensive process, particularly for large datasets.
  • Functions like UpdateSubjectCovers() and UpdateSubjectCounts() involve database updates and recalculations that can add significant overhead when called repeatedly.
  1. Mutex Locking (mutex.UpdatePeople.Start()): This locking mechanism prevents concurrent updates to ensure data consistency. While this makes sense to avoid race conditions, it leads to throttling issues (429 Too Many Requests) when the clustering is invoked excessively.

The logic may have been implemented this way to handle:

  • Immediate consistency: Ensuring that all face tags are optimized and clustering results are updated instantly after every change.
  • Avoidance of stale data: The developers might have been overly concerned about stale data in subjects or clusters, leading them to enforce Optimize() every time.

It's guess is as good as mine. I do agree that it does not need to run on each tagging operation. Perhaps it means less efficient clusters for the tagging session - fine. I can enter 50 names in the time it takes for one Optimize cycle to complete!

How to not Optimize on every call

  1. Batch Optimization: Instead of running Optimize() every time a face is tagged, Queue the optimization task and run it periodically (e.g., every few minutes) or when a batch of changes is detected.
  2. Health Metrics: Run the Optimize call when a specific metric, which is inexpensive to calculate, indicates that the matching would benefit from optimization. Candidates include:
  • Clustering state (average size of GROUP BY subject_uids in the faces table
  • Tagging a new subject (that did not exist before), and it's the third or fourth image of them
  1. Async Optimization and Partial Updates:
  • Immediately tag the face and return
  • run Optimize() asynchronously, updating only the affected clusters/subjects
  • Use a message queue pattern with a deduplicating filter to avoid unnecessary work
  1. User-Triggered Optimization:
  • Use user behavior queues on the front-end (such as, having the edit window open, and clicking previous/next photo) to determine that the user is "tagging faces" - when this happens, start a timer. Each next/previous load, restart the timer. Run Optimize only when there have been no API calls in some time (5 minutes, or user-adjustable)
  • Allow a User-Initiated optimization via a button in the UI.

If it is necessary to Optimize on every call...

  1. Verify that the loop early termination condition is working correctly. Mine never seems to finish in less than 11 cycles.
  2. Potentially use the new VEC_DISTANCE in MariaDB 11.7 (and 10.12) to reduce the number of markers that need to be Matched - I believe there is a separate ticket #4669 for this.
  3. Rather than retrying a merge of every single cluster each time, can we focus on just the subject_uid that changed?
  4. Something else?

In conclusion

Regarding @lastzero 's comment:

Unfortunately, we haven't had time to work on this yet. We will reference this issue in the related commits and mention it in the release notes once the improvements are available. Alternatively, feel free to work on this and provide us with a pull request that we just need to merge. Thank you! ❤️

I am happy to help, but would appreciate some thoughts on the best strategy first so that it does not end up as a wasted effort. Do you feel that it is critical to Optimize on every PUT API call? If so, why? If not, what is your preferred pattern for hiding this latency from the user? My goal is to contribute something usable, not add to the pile of 30-something unmerged PRs.

Edit: The loop actually runs 11 times, not 10 (because it's j <= 10). This matches the number of huge query spikes seen during the Optimize run in DBeaver.

@theshadow27 commented on GitHub (Dec 20, 2024): This has been bugging me for a while, but it's now taking 200-300 seconds per face. That's enough time to start looking through the code. ### Background The call that starts the blocking operation, and subsequently fails resulting in 429 is `PUT /api/v1/markers/:marker_uid` (controller is [internal/api/markers:182 `func UpdateMarker(router *gin.RouterGroup) `](https://github.com/photoprism/photoprism/blob/664f48ec42b0374d229ad9727055f750892ba7db/internal/api/markers.go#L182). The 429 error is returned on [line 191](https://github.com/photoprism/photoprism/blob/664f48ec42b0374d229ad9727055f750892ba7db/internal/api/markers.go#L191) when the `mutex.UpdatePeople.Start()` is already locked. The next line uses `defer mutex.UpdatePeople.Stop()` to lock the mutex until the `UpdateMarker` function returns. The issue is that the remainder of the function can take a VERY long time to return. Upwards of 3 minutes. The application exhibits both CPU utilization (100% of 1 core) and DB traffic (distinct spikes of DB traffic and queries). The best I can tell from the logs, when [`SaveForm`](https://github.com/photoprism/photoprism/blob/664f48ec42b0374d229ad9727055f750892ba7db/internal/api/markers.go#L225) reports `changed` (which will almost always be true when adding a name to a face that did not have one before) we do some stuff: 1) `get.Faces().Optimize()` ( when `marker.FaceID != "" && marker.SubjUID != "" && marker.SubjSrc == entity.SrcManual`, which is again always true from what I can gather based on the front end, since we passed the marker and subject in with the call and the src is always 'manual'). 2) ` query.UpdateSubjectCovers(true)` - seems like a [single, well optimized query](https://github.com/photoprism/photoprism/blob/develop/internal/entity/query/covers.go#L271) 3) `entity.UpdateSubjectCounts(true)` - also a [single, well optimized query](https://github.com/photoprism/photoprism/blob/develop/internal/entity/entity_counts.go#L100) 4) `query.PhotoByUID(file.PhotoUID)` - also a [single query](https://github.com/photoprism/photoprism/blob/develop/internal/entity/query/photo.go#L35) 4) `p.UpdateAndSaveTitle();` - [slightly more complicated](https://github.com/photoprism/photoprism/blob/develop/internal/entity/photo_title.go#L174) but nothing alarming) Based on log line timing, I suspect that the most time is being spent in (1), `Faces().Optomize()`, which is invoked on **every single successful call to PUT markers**. ### `faces_optimize.go` : `Optimize()` The [`Optomize()`](https://github.com/photoprism/photoprism/blob/develop/internal/photoprism/faces_optimize.go#L17) function is scary for something routinely invoked by a front-end client. First off, it starts with a loop to 11 (although there is an early abort, this is a *lot* of effort). Up to 11 times, until no progress is made we: 1) Query **EVERY** manually added face from the DB: [`ManuallyAddedFaces(false, false)`](https://github.com/photoprism/photoprism/blob/develop/internal/entity/query/faces.go#L65), basically `SELECT * FROM faces where face_hidden=false AND face_src="manual" AND subj_uid <> '' AND face_kind <= 1 ORDERBY subj_uid, samples DESC`. Remember - up to 10 times! *Why?!?* we are not manually adding new faces; the mutex has locked us out of doing so. 2) For EACH `n` manually added face from the DB (mine has `n=322` rows): - merge all faces where the subject has the same UID and continue (fine) - run `merge[0].match(faces[j].embedding())` (will come to this later). If that returns `ok` then print a log line and append the face to the merge - while the faces are the same, try to merge them in the DB using [`query.MergeFaces`](https://github.com/photoprism/photoprism/blob/develop/internal/entity/query/faces.go#L189) - more on this next 3) Return the result, which has a single property `Merged` which is an `int`. So a count of the number merged. 4) Print the count of merged faces `log.Infof("faces: merged %s", english.Plural(res.Merged, "cluster", "clusters"))` so that we know that Photoprism is ready to accept the next name (3 minutes later). In summary, we loop up to 11*322 times, calling `match` up to `len(faces)-len(subjects)` (like 322-135 = 187 times for me) times and MergeFaces `len(faces)-2*len(subjects)` times (per the log lines, this seems to be about 121-122 times). The 10 times is important - I am moderately confident that this loop *never* exits early (at least on my setup). See the DB traffic graph below for each name-adding operation: ![image](https://github.com/user-attachments/assets/36225a90-7c32-4f48-8b8d-2f3b70bca292) There are 11 distinct spikes of "UPDATE" statements (peak = 220/second). ### What happens in `MergeFaces` Yikes. Remember this happens essentially 11 * number of subjects for **each** `PUT` API call: 1) Allocate memory and serialized the embeddings (`entity.NewFace`) 2) DB call to find the first face with the subject UID or create it if it does not exist [`FindFirstOrCreate`](https://github.com/photoprism/photoprism/blob/develop/internal/entity/face.go#L387) 3) Attempt to match markers from the IDs [`MatchMarkers`](https://github.com/photoprism/photoprism/blob/develop/internal/entity/face.go#L252) which does another query for all markers with the faceIDs, running `Match` against each one - and potentially calling [`SetFace`](https://github.com/photoprism/photoprism/blob/develop/internal/entity/marker.go#L222) that then calls [`ResolveCollision`](https://github.com/photoprism/photoprism/blob/develop/internal/entity/face.go#L175) that AGAIN matches all the embeddings and might write to the DB. In my case, this is rare, but it does happen. This means that most of the cycles are wasted. 4) Purging orphan faces via [`PurgeOrphanFaces`](https://github.com/photoprism/photoprism/blob/develop/internal/entity/query/faces.go#L153) - this seems like an over-eager operation considering how many times it is run. The end result is thousands of `UPDATE` operations on the DB, along with substantial CPU time spent in `Match()`, for each `PUT` API call. ### Why Optimize Every Call? TBH I have no clue. I put a lot of the source through GPT4o and asked, and it said: > Running the `Optimize()` function every time a face is tagged is likely an overly cautious approach intended to ensure consistency in the face clustering database. However, this is a highly inefficient design for the following reasons: > > 1. **Redundancy in Optimization:** The optimization process (which appears to cluster faces and update subject-related data) does not need to be run for every face tagging event. Most likely, the state of the system doesn’t change drastically with a single new tag, especially for large datasets. > 2. **Expensive Operations:** > - The clustering (`Optimize()`) is an iterative and computationally expensive process, particularly for large datasets. > - Functions like `UpdateSubjectCovers()` and `UpdateSubjectCounts()` involve database updates and recalculations that can add significant overhead when called repeatedly. > 3. **Mutex Locking** (`mutex.UpdatePeople.Start()`): This locking mechanism prevents concurrent updates to ensure data consistency. While this makes sense to avoid race conditions, it leads to throttling issues (`429 Too Many Requests`) when the clustering is invoked excessively. > > The logic may have been implemented this way to handle: > > - **Immediate consistency:** Ensuring that all face tags are optimized and clustering results are updated instantly after every change. > - **Avoidance of stale data:** The developers might have been overly concerned about stale data in subjects or clusters, leading them to enforce Optimize() every time. It's guess is as good as mine. I do agree that it does not *need* to run on each tagging operation. Perhaps it means less efficient clusters for the tagging session - fine. I can enter 50 names in the time it takes for one `Optimize` cycle to complete! ### How to **not** `Optimize` on every call 1. **Batch Optimization**: Instead of running `Optimize()` every time a face is tagged, Queue the optimization task and run it periodically (e.g., every few minutes) or when a batch of changes is detected. 2. **Health Metrics**: Run the `Optimize` call when a specific metric, which is inexpensive to calculate, indicates that the matching would benefit from optimization. Candidates include: - Clustering state (average size of `GROUP BY subject_uids` in the `faces` table - Tagging a new subject (that did not exist before), and it's the third or fourth image of them 3. **Async Optimization and Partial Updates:** - Immediately tag the face and return - run `Optimize()` asynchronously, updating only the affected clusters/subjects - Use a message queue pattern with a deduplicating filter to avoid unnecessary work 4. **User-Triggered Optimization**: - Use user behavior queues on the front-end (such as, having the edit window open, and clicking previous/next photo) to determine that the user is "tagging faces" - when this happens, start a timer. Each next/previous load, restart the timer. Run `Optimize` only when there have been no API calls in some time (5 minutes, or user-adjustable) - Allow a User-Initiated optimization via a button in the UI. ### If it is necessary to `Optimize` on every call... 1) Verify that the loop early termination condition is working correctly. Mine never seems to finish in less than 11 cycles. 2) Potentially use the new [`VEC_DISTANCE` in MariaDB 11.7](https://mariadb.com/kb/en/create-table-with-vectors/) (and 10.12) to reduce the number of markers that need to be `Match`ed - I believe there is a separate ticket [#4669](https://github.com/photoprism/photoprism/issues/4669) for this. 3) Rather than retrying a merge of every single cluster each time, can we focus on just the subject_uid that changed? 4) Something else? ### In conclusion Regarding @lastzero 's comment: > Unfortunately, we haven't had time to work on this yet. We will reference this issue in the related commits and mention it in the release notes once the improvements are available. Alternatively, feel free to work on this and provide us with a pull request that we just need to merge. Thank you! ❤️ I am happy to help, but would appreciate some thoughts on the best strategy first so that it does not end up as a wasted effort. Do you feel that it is critical to `Optimize` on every `PUT` API call? If so, why? If not, what is your preferred pattern for hiding this latency from the user? My goal is to contribute something usable, not add to the pile of 30-something unmerged PRs. Edit: The loop actually runs 11 times, not 10 (because it's `j <= 10`). This matches the number of huge query spikes seen during the `Optimize` run in DBeaver.
Author
Owner

@theshadow27 commented on GitHub (Dec 20, 2024):

To further investigate, I used DBeaver to export the faces table before and after one of these multi-minute, CPU intensive Optimize() calls. The net diff was a single row (the face that was changed), which had only a single field added (subject_uid).

image

From this, I have to assume that the clusters aren't actually being merged in a meaningful way in the long-running Optimize() calls, despite the logs reporting that they are. Perhaps this is a bug, and if the clusters were actually merged, then there would not be repeated "faces: merged 121 clusters [2m30.469009483s]" lines in the log...

2024-12-20 17:05:55 time="2024-12-20T22:05:55Z" level=info msg="changes successfully saved"
2024-12-20 17:09:07 time="2024-12-20T22:09:07Z" level=info msg="faces: merged 120 clusters"
2024-12-20 17:09:07 time="2024-12-20T22:09:07Z" level=info msg="changes successfully saved"
2024-12-20 17:15:34 time="2024-12-20T22:15:34Z" level=info msg="faces: merged 121 clusters [2m30.469009483s]"
2024-12-20 17:19:29 time="2024-12-20T22:19:29Z" level=info msg="faces: updated 1 marker, recognized 1 face, 0 unknown [3m55.490692353s]"
2024-12-20 17:19:37 time="2024-12-20T22:19:37Z" level=info msg="index: updated 8 photos [7.306792706s]"
2024-12-20 17:33:05 time="2024-12-20T22:33:05Z" level=info msg="changes successfully saved"
2024-12-20 17:33:21 time="2024-12-20T22:33:21Z" level=info msg="changes successfully saved"
2024-12-20 17:35:48 time="2024-12-20T22:35:48Z" level=info msg="changes successfully saved"
2024-12-20 17:36:18 time="2024-12-20T22:36:18Z" level=info msg="changes successfully saved"
2024-12-20 17:36:22 time="2024-12-20T22:36:22Z" level=info msg="changes successfully saved"
2024-12-20 17:39:13 time="2024-12-20T22:39:13Z" level=info msg="faces: merged 121 clusters"
2024-12-20 17:39:14 time="2024-12-20T22:39:14Z" level=info msg="changes successfully saved"
2024-12-20 17:45:33 time="2024-12-20T22:45:33Z" level=info msg="faces: merged 121 clusters [2m29.398296581s]"
@theshadow27 commented on GitHub (Dec 20, 2024): To further investigate, I used DBeaver to export the faces table before and after one of these multi-minute, CPU intensive `Optimize()` calls. The net diff was a single row (the face that was changed), which had only a single field added (`subject_uid`). ![image](https://github.com/user-attachments/assets/f0987cb3-ecfc-415d-9df9-5bd2843cd2d8) From this, I have to assume that the clusters aren't actually being **merged** in a meaningful way in the long-running `Optimize()` calls, despite the logs reporting that they are. Perhaps this is a bug, and if the clusters were actually merged, then there would not be repeated `"faces: merged 121 clusters [2m30.469009483s]"` lines in the log... ``` 2024-12-20 17:05:55 time="2024-12-20T22:05:55Z" level=info msg="changes successfully saved" 2024-12-20 17:09:07 time="2024-12-20T22:09:07Z" level=info msg="faces: merged 120 clusters" 2024-12-20 17:09:07 time="2024-12-20T22:09:07Z" level=info msg="changes successfully saved" 2024-12-20 17:15:34 time="2024-12-20T22:15:34Z" level=info msg="faces: merged 121 clusters [2m30.469009483s]" 2024-12-20 17:19:29 time="2024-12-20T22:19:29Z" level=info msg="faces: updated 1 marker, recognized 1 face, 0 unknown [3m55.490692353s]" 2024-12-20 17:19:37 time="2024-12-20T22:19:37Z" level=info msg="index: updated 8 photos [7.306792706s]" 2024-12-20 17:33:05 time="2024-12-20T22:33:05Z" level=info msg="changes successfully saved" 2024-12-20 17:33:21 time="2024-12-20T22:33:21Z" level=info msg="changes successfully saved" 2024-12-20 17:35:48 time="2024-12-20T22:35:48Z" level=info msg="changes successfully saved" 2024-12-20 17:36:18 time="2024-12-20T22:36:18Z" level=info msg="changes successfully saved" 2024-12-20 17:36:22 time="2024-12-20T22:36:22Z" level=info msg="changes successfully saved" 2024-12-20 17:39:13 time="2024-12-20T22:39:13Z" level=info msg="faces: merged 121 clusters" 2024-12-20 17:39:14 time="2024-12-20T22:39:14Z" level=info msg="changes successfully saved" 2024-12-20 17:45:33 time="2024-12-20T22:45:33Z" level=info msg="faces: merged 121 clusters [2m29.398296581s]" ```
Author
Owner

@theshadow27 commented on GitHub (Dec 21, 2024):

I think I found the bug causing the Optimize loop to not terminate early (and run 11 times, i := 0; i <= 10; i++):

The early termination code:

               // Done?
		if result.Merged <= c {
			break
		}

checks to see if any faces were merged on the last run through the loop. Basically, if a call to query.MergeFaces(merge, false) with mergeable faces (clusters that Match with merge[0]) does not error, we assume that all faces were merged with result.Merged += len(merge).

This is a flawed assumption because:

  • for a valid (matching subj_uid and embedding Match) set of mergeable faces, MergeFaces is not guaranteed to actually eliminate any (or all) of the supplied Faces
  • Specifically, in MatchMarkers, the "old, suboptimal" cluster face_id is not replaced when m.Match(marker.Embeddings()) is not ok; in this case we // Ignore on face.go:L265.
  • So if the merged cluster does NOT result in EVERY face_id being replaced with the new super-cluster, then PurgeOrphanFaces cannot delete the old clusters.
  • We try to check this with a if removed > 0 call in faces.go:L216, which seems like we should get an error log when this happens, and we don't see any such errors. Except:
  • In the PurgeOrphanFaces function, there appears to be a subtle bug where if no rows are affected by the delete, i.e. else if result.RowsAffected > 0 is not true, the len(ids) (batch size) is added to the affected counter anyway
  • So in the case that we try to merge (for example) 3 clusters, but 0 can be deleted, there is no warning/error but MergeFaces returns as if it did useful work.

Because the daemon runs this function every 15 minutes, it is highly likely that all face clusters which can be merged have been merged. In a static library in steady state that has at least 2 clusters which could be merged according to the embeddings matching, but the smaller cluster can't be removed due to the new cluster not matching the face, this loop will run 11 times; checking every cluster against all others with the same subj_uid each time.

This could be fixed in 2 ways:

  1. Fix PurgeOrphanFaces so that if no rows are affected, it returns 0. This will trigger the WarnF in MergeFaces and the Errorf in Optimize each time, but at least result.Merged won't be incremented and so the 11x loop won't run unnecessarily. The downside to this (besides the log output) is that if PurgeOrphanFaces or MergeFaces is used elsewhere and relies upon the broken behavior, that the other consumers may no longer work as expected. GitHub doesn't seem to think that either function has other consumers, but it might have missed something.
  2. Find a different early-termination condition. This has the upside of not disrupting the current behavior. For example, since we query.ManuallyAddedFaces each time, we could track the number of face clusters returned (n), and terminate if n >= prev_n (i.e. the number of clusters increased or stayed the same). The downside here is that it would require an extra query of all faces (with their embeddings).

@lastzero if you let me know if you prefer fix (1) or fix (2) more, I am happy to submit a PR.

There are a few other things I noticed while tracing this code:

In the Optimize function, the else if len(merge) == 1 { merge = nil } condition probably is a bug. This is trying to reset the set of mergeable faces, for the case where it's the same subject but the faces don't match. In doing so, it skips a potential face to merge. Consider clusters A, B, C, D, E, F all of the same subject:

  • A, B -> merge, reset
  • C (D does not match) so reset
  • Intended Behavior (I think): Try to match D with E
  • Actual Behavior: Try to match E with F

To fix this, change the code to else if len(merge) == 1 { merge = entity.Faces{faces[j]} }

A more aggressive refactoring could also improve performance significantly: rather than querying all faces each time through the loop, we should get the list of faces at the start, then only query for face IDs with a created_at constraint, fetching only the new faces. Better yet (though more involved), all of the operations could be kept in memory, with a append-only log of 'added' and 'removed' to write to the DB once at the end. When there are hundreds of faces, this repeated querying does consume significant DB resources.

Edit: One further (but schema-changing) optimization would be to add a column optimized_at in the faces table. Whenever a face cluster was matched/added to a marker, this column would be cleared. In the Optimize loop, following an optimization run, each remaining cluster that was processed would have optimized_at set. Add one more argument to ManuallyAddedFaces(hidden, ignored, optimized) that only returns Faces where at least one face with that subj_uid has an empty optimized_at value... The query would be something like

SELECT *
FROM faces f
WHERE 
    f.face_hidden = false 
    AND f.face_src = 'manual'
    AND f.subj_uid <> ''
    AND f.face_kind <= 1
    AND EXISTS (
        SELECT 1
        FROM faces sub_f
        WHERE 
            sub_f.subj_uid = f.subj_uid
            AND (sub_f.optimized_at IS NULL OR sub_f.optimized_at = '')
    )
ORDER BY f.subj_uid, f.samples DESC;

Or thereabouts. In this way, we'd only be processing faces where at least one marker had changed; rather than doing the same work over and over.

Insanity is doing the same thing over and over again and expecting different results - Albert Einstein

@theshadow27 commented on GitHub (Dec 21, 2024): I think I found the bug causing the Optimize loop to not terminate early (and run **11** times, `i := 0; i <= 10; i++`): The [early termination code](https://github.com/photoprism/photoprism/blob/develop/internal/photoprism/faces_optimize.go#L60): ```go // Done? if result.Merged <= c { break } ``` checks to see if any faces were merged on the last run through the loop. Basically, if a call to `query.MergeFaces(merge, false)` with mergeable faces (clusters that `Match` with `merge[0]`) does not error, we assume that all faces were merged with `result.Merged += len(merge)`. This is a flawed assumption because: - for a valid (matching `subj_uid` and embedding `Match`) set of mergeable faces, `MergeFaces` is **not** guaranteed to actually eliminate any (or all) of the supplied `Face`s - Specifically, in `MatchMarkers`, the "old, suboptimal" cluster `face_id` is **not** replaced when `m.Match(marker.Embeddings())` is not `ok`; in this case we `// Ignore` on [face.go:L265](https://github.com/photoprism/photoprism/blob/develop/internal/entity/face.go#L265). - So if the merged cluster does **NOT** result in **EVERY** `face_id` being replaced with the new super-cluster, then `PurgeOrphanFaces` cannot delete the old clusters. - We try to check this with a `if removed > 0` call in [faces.go:L216](https://github.com/photoprism/photoprism/blob/develop/internal/entity/query/faces.go#L216), which seems like we *should* get an error log when this happens, and we don't see any such errors. Except: - In the `PurgeOrphanFaces` function, there appears to be a subtle bug where if no rows are affected by the delete, i.e. `else if result.RowsAffected > 0` is not true, the `len(ids)` (batch size) **is added to the `affected` counter anyway** - So in the case that we try to merge (for example) 3 clusters, but 0 can be deleted, there is no warning/error but `MergeFaces` returns as if it did useful work. Because the daemon runs this function every 15 minutes, it is **highly likely** that all face clusters which can be merged have been merged. In a static library in steady state that has at least 2 clusters which could be merged according to the embeddings matching, but the smaller cluster can't be removed due to the new cluster not matching the face, this loop will run 11 times; checking every cluster against all others with the same `subj_uid` each time. This could be fixed in 2 ways: 1) Fix `PurgeOrphanFaces` so that if no rows are affected, it returns 0. This will trigger the [WarnF](https://github.com/photoprism/photoprism/blob/develop/internal/entity/query/faces.go#L219) in `MergeFaces` and the [Errorf](https://github.com/photoprism/photoprism/blob/develop/internal/photoprism/faces_optimize.go#L44) in `Optimize` each time, but at least `result.Merged` won't be incremented and so the 11x loop won't run unnecessarily. The downside to this (besides the log output) is that if `PurgeOrphanFaces` or `MergeFaces` is used elsewhere and relies upon the broken behavior, that the other consumers may no longer work as expected. GitHub doesn't seem to think that either function has other consumers, but it might have missed something. 2) Find a different early-termination condition. This has the upside of not disrupting the current behavior. For example, since we `query.ManuallyAddedFaces` each time, we could track the number of face clusters returned (n), and terminate if `n >= prev_n` (i.e. the number of clusters increased or stayed the same). The downside here is that it would require an extra query of all faces (with their embeddings). @lastzero if you let me know if you prefer fix (1) or fix (2) more, I am happy to submit a PR. There are a few other things I noticed while tracing this code: In the `Optimize` function, the `else if len(merge) == 1 { merge = nil }` condition probably is a bug. This is trying to reset the set of mergeable faces, for the case where it's the same subject but the faces don't match. In doing so, it skips a potential face to merge. Consider clusters A, B, C, D, E, F all of the same subject: - A, B -> merge, reset - C (D does not match) so reset - Intended Behavior (I think): Try to match **D** with **E** - Actual Behavior: Try to match **E** with **F** To fix this, change the code to `else if len(merge) == 1 { merge = entity.Faces{faces[j]} }` A more aggressive refactoring could also improve performance significantly: rather than querying **all** faces each time through the loop, we should get the list of faces at the start, then only query for face IDs with a `created_at` constraint, fetching only the new faces. Better yet (though more involved), all of the operations could be kept in memory, with a append-only log of 'added' and 'removed' to write to the DB once at the end. When there are hundreds of `faces`, this repeated querying does consume significant DB resources. Edit: One further (but schema-changing) optimization would be to add a column `optimized_at` in the `faces` table. Whenever a face cluster was matched/added to a `marker`, this column would be cleared. In the `Optimize` loop, following an optimization run, each remaining cluster that was processed would have `optimized_at` set. Add one more argument to `ManuallyAddedFaces(hidden, ignored, optimized)` that only returns Faces where *at least one* face with that `subj_uid` has an empty `optimized_at` value... The query would be something like ```sql SELECT * FROM faces f WHERE f.face_hidden = false AND f.face_src = 'manual' AND f.subj_uid <> '' AND f.face_kind <= 1 AND EXISTS ( SELECT 1 FROM faces sub_f WHERE sub_f.subj_uid = f.subj_uid AND (sub_f.optimized_at IS NULL OR sub_f.optimized_at = '') ) ORDER BY f.subj_uid, f.samples DESC; ``` Or thereabouts. In this way, we'd only be processing faces where at least one marker had changed; rather than doing the same work over and over. > *Insanity is doing the same thing over and over again and expecting different results* - Albert Einstein
Author
Owner

@theshadow27 commented on GitHub (Dec 30, 2024):

Have a PR going. No confirmed fix yet. https://github.com/photoprism/photoprism/pull/4691
BTW tests failing in WSL off develop - probably not your native env.

@theshadow27 commented on GitHub (Dec 30, 2024): Have a PR going. No confirmed fix yet. https://github.com/photoprism/photoprism/pull/4691 BTW tests failing in WSL off `develop` - probably not your native env.
Author
Owner

@theshadow27 commented on GitHub (Dec 30, 2024):

(from the PR for visibility by @lastzero )

This issue is deeper than initially suspected - the proposed fix (1) is necessary but not sufficient.

After implementing the proposed fix (1), the loop is still running all 11 times. It seems that when there are several clusters of the same size (in my case 6) it is possible for each iteration to attempt to merge them, resulting in one new cluster being created and one old cluster being dropped.

Trying to grok what is going on here...

When the new cluster is created, it's ID comes from the embeddings... the centroid of all possible merge candidates. Somehow, this results in a new face hash (rounding issues? order of operations?) and the creation of a new automatic cluster which replaces the old one. Since MatchMarkers eagerly asserts ownership over the markers, it is happy to take them from the other automatic cluster. The database is updated by adding the new cluster, removing the old cluster, (effectively a nop)... and the cycle repeats.

I assumed that the intent of the md5 face_id was to prevent exactly this problem, and initially speculated that a stable sort order from ManuallyAddedFaces might fix it, but thinking about it some more, I can see how it would not:

if A + B + C = D
then D + B + C = E
then E + B + C = F
...

each time, the centroid is shifted slightly, resulting in a new face_id - one that will be very very close, but not exactly, to the previous one.

Initial thought: One way around this would be to do the match in-memory, i.e. MatchMarkers should store the results but not actually write anything. Then, it could be compared to the previous marker set, and if it was the same set of markers, not persist itself or take them over. Given the structure of the code, this is a complicated change. I suppose it could be done with less friction inside a TRANSACTION (with rollback) at the expense of additional DB overhead. I am not sure the best way to proceed here. If you have any suggestions @lastzero I would be happy to try them with my problematic db.

In the meantime, I will add a parameter to Optimize(subj_uid) that can be passed through to ManuallyAddedFaces for the special case of markers.UpdateMarker when the subj_uid is known in advance. Because of the locking here, this will only ever result in that subject being optimized (it's the only face cluster to have changed, and the merge set is all the same subj_uid). At least that will cut down on the amount of work this loop has to do while the user is blocked.

Edit: Noticed another problem (bug?). The sort in ManuallyAddedFaces orders by samples with the assumption that this will somehow be connected to reality. However, it seems that this value is never updated - not with Optimize, not with photoprism faces update -f, not with photoprism faces audit --fix. Once it's there, it's set (as far as I have observed). In doing some spot checks, I have several cases where samples is something like 7-10, but there are 25+ markers with that face_id meeting the minimum score and size. Unless I'm missing something, and the intent is not to denote how many markers are matched to the cluster?

@theshadow27 commented on GitHub (Dec 30, 2024): (from the PR for visibility by @lastzero ) This issue is deeper than initially suspected - the proposed fix (1) is necessary but not sufficient. After implementing the proposed fix (1), the loop is still running all 11 times. It seems that when there are several clusters of the same size (in my case 6) it is possible for each iteration to attempt to merge them, resulting in one new cluster being created and one old cluster being dropped. Trying to grok what is going on here... When the new cluster is created, it's ID comes from the embeddings... the centroid of all possible merge candidates. Somehow, this results in a new face hash (rounding issues? order of operations?) and the creation of a new automatic cluster which replaces the old one. Since `MatchMarkers` eagerly asserts ownership over the markers, it is happy to take them from the other automatic cluster. The database is updated by adding the new cluster, removing the old cluster, (effectively a nop)... and the cycle repeats. I *assumed* that the intent of the md5 `face_id` was to prevent exactly this problem, and initially speculated that a stable sort order from `ManuallyAddedFaces` might fix it, but thinking about it some more, I can see how it would not: ``` if A + B + C = D then D + B + C = E then E + B + C = F ... ``` each time, the centroid is shifted slightly, resulting in a new `face_id` - one that will be very very close, but not exactly, to the previous one. **Initial thought**: One way around this would be to do the match in-memory, i.e. `MatchMarkers` should store the results but not actually write anything. Then, it could be compared to the previous marker set, and if it was the same set of markers, not persist itself or take them over. Given the structure of the code, this is a complicated change. I suppose it could be done with less friction inside a `TRANSACTION` (with rollback) at the expense of additional DB overhead. I am not sure the best way to proceed here. If you have any suggestions @lastzero I would be happy to try them with my problematic db. In the meantime, I will add a parameter to `Optimize(subj_uid)` that can be passed through to `ManuallyAddedFaces` for the special case of `markers.UpdateMarker` when the `subj_uid` is known in advance. Because of the locking here, this will only ever result in that subject being optimized (it's the only face cluster to have changed, and the `merge` set is all the same `subj_uid`). At least that will cut down on the amount of work this loop has to do while the user is blocked. Edit: Noticed another problem (bug?). The sort in `ManuallyAddedFaces` orders by `samples` with the assumption that this will somehow be connected to reality. However, it seems that this value is never updated - not with Optimize, not with `photoprism faces update -f`, not with `photoprism faces audit --fix`. Once it's there, it's set (as far as I have observed). In doing some spot checks, I have several cases where samples is something like 7-10, but there are 25+ markers with that `face_id` meeting the minimum score and size. Unless I'm missing something, and the intent is not to denote how many markers are matched to the cluster?
Author
Owner

@lastzero commented on GitHub (Dec 31, 2024):

@theshadow27 Thanks for taking a look and sharing your insights! 🧐

Our current plan is to offload (most of) the calculations to the new MariaDB vector functions in an upcoming release:

This should yield better results and be much faster than our own vector comparisons, which were intended as a temporary solution until we have a proper vector database/library in place. Unfortunately, as we are currently performing a major upgrade of our frontend, I do not have any code or a more specific timeline/roadmap for this yet:

The edge cases that need to be handled/investigated (see your comments and what I've commented in other places) further encourage us to go this way. If you like, you are welcome to experiment with the vector functions and work on a proof of concept?

Of course, if you find improvements to the current code that we can test, merge, and release within a reasonable timeframe, we'd be happy to do that as well - but my feeling is that the effort required for a fast and completely bug-free implementation is beyond what anyone could maintain/contribute... 🤔

@lastzero commented on GitHub (Dec 31, 2024): @theshadow27 Thanks for taking a look and sharing your insights! 🧐 Our current plan is to offload (most of) the calculations to the new MariaDB vector functions in an upcoming release: - https://mariadb.org/projects/mariadb-vector/ This should yield better results and be much faster than our own vector comparisons, which were intended as a temporary solution until we have a proper vector database/library in place. Unfortunately, as we are [currently performing a major upgrade of our frontend](https://github.com/photoprism/photoprism/issues/3168), I do not have any code or a more specific timeline/roadmap for this yet: - https://github.com/photoprism/photoprism/issues/4669 The edge cases that need to be handled/investigated (see your comments and what I've commented in other places) further encourage us to go this way. If you like, you are welcome to experiment with the vector functions and work on a proof of concept? ✨ Of course, if you find improvements to the current code that we can test, merge, and release within a reasonable timeframe, we'd be happy to do that as well - but my feeling is that the effort required for a fast and completely bug-free implementation is beyond what anyone could maintain/contribute... 🤔
Author
Owner

@theshadow27 commented on GitHub (Dec 31, 2024):

@lastzero thanks for getting back, and happy new years! 🥂

First off, IMHO the current implementation/"temporary" solution is VERY good at meeting it's functional requirements - better than Apple Photos for sure. In the rare situation where it's confused a subject, it's straightforward to eject them and fix it. New markers are identified and matched quickly and accurately. Youth sports are a brutal use case, and I sincerely appreciate the effort that went into the current version! The only issue I have is with the nonfunctional UX requirement of not having excessive delays while naming many markers in quick succession.

To be clear: the current state of the draft PR offers a significant improvement to the UX; by limiting the Optimize function to the face supplied in the POST, it almost never triggers the 11-iteration loop, unless you are specifically working on one of those subjects that has problematic clusters. It is not a big PR (40 lines, 12 lines functional), and I don't believe it changes the behavior at all for the worker/CLI usage - a huge plus for building confidence after years of stable deployments.

Regarding the Vector implementation - I'm already thinking about it, but the scope will be much wider than this PR, with impacts from DBSCAN up to everywhere that Match is used (at least 11 places) - likely requiring substantial rework. In addition to schema updates, a Vector implementation requires a minimum DB version that could break a lot of existing embedded/hosted installs without manual intervention during the upgrade. To keep backwards compatibility, some version of the current implementation strategy needs to be encapsulated behind an interface, further expanding the scope of refactoring, and potentially running with two different Gorm entities depending on a flag. I don't know enough about Gorm to know if that's ok or not. This is a Big project, lots of new code, lots of testing, bug hunting, and so on.

As such, my suggestion is to try to get the minimum change for maximum UX improvement merged sooner than later - there are quite a few users who have reported this or derivative issues going back to October 2022. It doesn't fix the root cause, but it hides it well enough to make a big difference for those who are impacted. What do you think?

P.S. If you're not comfortable rolling it out everywhere, it is not much more to hide it behind a flag (like PHOTOSTREAM_EXPERIMENTAL_FACES_OPTIMIZE) that would eliminate nearly all of the risk for "normal" users doing routine upgrades.

P.P.S. I am working on another branch that has a more substantial diff; a re-write of the Optimize function completely for how I think it should work - Breaking out each subj_uid into it's own struct and operating on them independently, spending more effort on finding mergeable clusters within a subject, going back to ground-truth tagged markers, instead of cluster-of-clusters, and only re-fetching faces when that particular subject had a change. This will be several hundred lines diff, so I don't expect it to be accepted without substantial testing that I will likely not finish on this holiday 😄 .

@theshadow27 commented on GitHub (Dec 31, 2024): @lastzero thanks for getting back, and happy new years! 🥂 First off, IMHO the current implementation/"temporary" solution is **VERY** good at meeting it's functional requirements - better than Apple Photos for sure. In the rare situation where it's confused a subject, it's straightforward to eject them and fix it. New markers are identified and matched quickly and accurately. Youth sports are a brutal use case, and I sincerely appreciate the effort that went into the current version! The only issue I have is with the nonfunctional UX requirement of not having excessive delays while naming many markers in quick succession. To be clear: the current state of the draft PR offers a **significant** improvement to the UX; by limiting the `Optimize` function to the face supplied in the POST, it almost never triggers the 11-iteration loop, unless you are specifically working on one of those subjects that has problematic clusters. It is not a big PR (40 lines, 12 lines functional), and I don't believe it changes the behavior at all for the worker/CLI usage - a huge plus for building confidence after years of stable deployments. Regarding the Vector implementation - I'm already thinking about it, but the scope will be *much* wider than this PR, with impacts from DBSCAN up to everywhere that Match is used (at least 11 places) - likely requiring substantial rework. In addition to schema updates, a Vector implementation requires a minimum DB version that could break a lot of existing embedded/hosted installs without manual intervention during the upgrade. To keep backwards compatibility, some version of the current implementation strategy needs to be encapsulated behind an interface, further expanding the scope of refactoring, and potentially running with two different Gorm entities depending on a flag. I don't know enough about Gorm to know if that's ok or not. This is a *Big* project, lots of new code, lots of testing, bug hunting, and so on. As such, my suggestion is to try to get the minimum change for maximum UX improvement merged sooner than later - there are quite a few users who have reported this or derivative issues going back to October 2022. It doesn't fix the root cause, but it hides it well enough to make a big difference for those who are impacted. What do you think? P.S. If you're not comfortable rolling it out everywhere, it is not much more to hide it behind a flag (like `PHOTOSTREAM_EXPERIMENTAL_FACES_OPTIMIZE`) that would eliminate nearly all of the risk for "normal" users doing routine upgrades. P.P.S. I am working on another branch that has a more substantial diff; a re-write of the `Optimize` function completely for how I think it should work - Breaking out each `subj_uid` into it's own struct and operating on them independently, spending more effort on finding mergeable clusters within a subject, going back to ground-truth tagged markers, instead of cluster-of-clusters, and only re-fetching faces when that particular subject had a change. This will be several hundred lines diff, so I don't expect it to be accepted without substantial testing that I will likely not finish on this holiday 😄 .
Author
Owner

@lastzero commented on GitHub (Mar 26, 2025):

@theshadow27 Thank you so much for your patience while we completed the frontend upgrade! 🙏

The last batch of UI/UX enhancements was released last week, so I'm happy to report that your PR is now merged and ready to be tested with our development preview build:

Any help with that is much appreciated :)

@lastzero commented on GitHub (Mar 26, 2025): @theshadow27 Thank you so much for your patience while we completed the frontend upgrade! 🙏 The last batch of UI/UX enhancements [was released last week](https://github.com/photoprism/photoprism/releases/tag/250321-57590c48b), so I'm happy to report that your PR is now merged and ready to be tested with our development preview build: - https://docs.photoprism.app/release-notes/#development-preview - https://docs.photoprism.app/getting-started/updates/#development-preview Any help with that is much appreciated :)
Author
Owner

@Coloradohusky commented on GitHub (Apr 6, 2025):

While I am not OP, I have definitely noticed speedups with manual tagging when on the latest preview build, 250405-eed7bfff7 - speeds went from around 30 seconds to a minute+ before, to around 2-10 seconds now. (although in certain cases it can still take 45 seconds, but it is much better overall)

@Coloradohusky commented on GitHub (Apr 6, 2025): While I am not OP, I have definitely noticed speedups with manual tagging when on the latest preview build, `250405-eed7bfff7` - speeds went from around 30 seconds to a minute+ before, to around 2-10 seconds now. (although in certain cases it can still take 45 seconds, but it is much better overall)
Author
Owner

@lastzero commented on GitHub (Apr 7, 2025):

@Coloradohusky Thanks for testing! While the API could also be faster due to other performance improvements we've implemented in the meantime, it's definitely great to hear :)

@lastzero commented on GitHub (Apr 7, 2025): @Coloradohusky Thanks for testing! While the API could also be faster due to other performance improvements we've implemented in the meantime, it's definitely great to hear :)
Author
Owner

@mat1990dj commented on GitHub (Jan 12, 2026):

I have to say that after the last november update (251130), this is happening again, I performed photoprism faces audit --fix and photoprism faces index, and I have a LOT to fix. Every change I need to wait more than one minute until the next action on faces

time="2026-01-09T14:31:57Z" level=info msg="subject: added person 'Olivia Valles'"
time="2026-01-09T14:32:59Z" level=info msg="changes successfully saved"

@mat1990dj commented on GitHub (Jan 12, 2026): I have to say that after the last november update (251130), this is happening again, I performed photoprism faces audit --fix and photoprism faces index, and I have a LOT to fix. Every change I need to wait more than one minute until the next action on faces time="2026-01-09T14:31:57Z" level=info msg="subject: added person 'Olivia Valles'" time="2026-01-09T14:32:59Z" level=info msg="changes successfully saved"
Author
Owner

@davidetoile commented on GitHub (Jan 12, 2026):

I confirm, same problem on my side...

@davidetoile commented on GitHub (Jan 12, 2026): I confirm, same problem on my side...
Author
Owner

@mat1990dj commented on GitHub (Jan 12, 2026):

additionally, I think we should also be able to have a screen where is shows detected faces ungrouped, and be able to multiselect and then assign a person

@mat1990dj commented on GitHub (Jan 12, 2026): additionally, I think we should also be able to have a screen where is shows detected faces ungrouped, and be able to multiselect and then assign a person
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/photoprism#1679
No description provided.