Labels: Load and use a custom TensorFlow model #103

Closed
opened 2026-02-19 23:02:20 -05:00 by deekerman · 49 comments
Owner

Originally created by @Kazakuri on GitHub (Jul 23, 2019).

Originally assigned to: @raystlin, @graciousgrey, @omerdduran on GitHub.

Issuehunt badges

It'd be nice to have some sort of documented procedure for loading a different TensorFlow model into PhotoPrism.

I don't have much experience with machine learning so I may be missing the mark with some of my assumptions but I tried to look into it a bit and wrote down my findings here.


From my initial investigation it looks like NASNet is downloaded when building the Docker image, and then the model is loaded from a static location in the application config.

By default that's ~/.local/share/photoprism/resources/nasnet so to replace the TensorFlow model you can just mount that folder with a new asset or otherwise change the assets-path in the config file.

From there I start to get a bit lost, when loading the model it seems like it loads a graph that's tagged with the name photoprism and labels from labels.txt in the same folder. It might be helpful to make the name of the tag that's loaded part of the configuration to make changing out models easier.

The documentation I found for LoadSavedModel doesn't tell me much, but my assumption is that it's only capable of loading protocol buffer models, so models must be trained and saved as .pb files or somehow converted to a .pb file.

The other thing I've noticed when looking through the code is that PhotoPrism uses models trained at 224x224, so I assume that if a custom model was trained at a different size then those numbers would need to be changed as well.


As a summary of what I've gathered it seems like these are the steps required:

  1. Get your model into the correct format (.pb).
    • I'm not sure how possible this is, but maybe provide resources on how to do this for other common model types? Ex. h5, cntk, pth/pkl
  2. Update the name of the graph being loaded from the model.
    • Either by updating the source code, or ideally a config file change.
  3. Update the resolution that the images are scaled to.
    • Either by updating the source code, or ideally a config file change.
  4. Mount your model folder as a volume to ~/.local/share/photoprism/resources/nasnet.
    • This should contain saved_model.pb and labels.txt.

IssueHunt Summary

Backers (Total: $60.00)

  • $60.00 have been anonymously funded.

Become a backer now!

Or submit a pull request to get the deposits!

Tips

Originally created by @Kazakuri on GitHub (Jul 23, 2019). Originally assigned to: @raystlin, @graciousgrey, @omerdduran on GitHub. <!-- Issuehunt Badges --> [<img alt="Issuehunt badges" src="https://img.shields.io/badge/IssueHunt-%2460%20Funded-%2300A156.svg" />](https://issuehunt.io/r/photoprism/photoprism/issues/127) <!-- /Issuehunt Badges --> It'd be nice to have some sort of documented procedure for loading a different TensorFlow model into PhotoPrism. I don't have much experience with machine learning so I may be missing the mark with some of my assumptions but I tried to look into it a bit and wrote down my findings here. -------------------------------------------------------- From my initial investigation it looks like NASNet is [downloaded when building the Docker image](https://github.com/photoprism/photoprism/blob/de1a02694c82843a51bc8f5b91a07fec11f4414c/scripts/download-nasnet.sh), and then the model is [loaded from a static location in the application config](https://github.com/photoprism/photoprism/blob/801097c3688a0d86f293ed872ada0503b64484bf/internal/config/config.go#L411). By default that's `~/.local/share/photoprism/resources/nasnet` so to replace the TensorFlow model you can just mount that folder with a new asset or otherwise change the `assets-path` in the config file. From there I start to get a bit lost, when loading the model it seems like it [loads a graph that's tagged with the name `photoprism`](https://github.com/photoprism/photoprism/blob/a995bb87de5c4b9fd529364a6d4a19e2dc49c41c/internal/photoprism/tensorflow.go#L162) and labels from `labels.txt` in the same folder. It might be helpful to make the name of the tag that's loaded part of the configuration to make changing out models easier. [The documentation](https://godoc.org/github.com/tensorflow/tensorflow/tensorflow/go#LoadSavedModel) I found for `LoadSavedModel` doesn't tell me much, but my assumption is that it's only capable of loading protocol buffer models, so models must be trained and saved as .pb files or somehow converted to a .pb file. The other thing I've noticed when looking through the code is that [PhotoPrism uses models trained at 224x224](https://github.com/photoprism/photoprism/blob/a995bb87de5c4b9fd529364a6d4a19e2dc49c41c/internal/photoprism/tensorflow.go#L244), so I assume that if a custom model was trained at a different size then those numbers would need to be changed as well. --------------------------------------------- As a summary of what I've gathered it seems like these are the steps required: 1. Get your model into the correct format (.pb). - I'm not sure how possible this is, but maybe provide resources on how to do this for other common model types? Ex. h5, cntk, pth/pkl 2. Update the name of the graph being loaded from the model. - Either by updating the source code, or ideally a config file change. 3. Update the resolution that the images are scaled to. - Either by updating the source code, or ideally a config file change. 4. Mount your model folder as a volume to `~/.local/share/photoprism/resources/nasnet`. - This should contain `saved_model.pb` and `labels.txt`. <!-- Issuehunt content --> --- <details> <summary> <b>IssueHunt Summary</b> </summary> ### Backers (Total: $60.00) - $60.00 have been anonymously funded. #### [Become a backer now!](https://issuehunt.io/r/photoprism/photoprism/issues/127) #### [Or submit a pull request to get the deposits!](https://issuehunt.io/r/photoprism/photoprism/issues/127) ### Tips - Checkout the [Issuehunt explorer](https://issuehunt.io/r/photoprism/photoprism/) to discover more funded issues. - Need some help from other developers? [Add your repositories](https://issuehunt.io/r/new) on IssueHunt to raise funds. </details> <!-- /Issuehunt content-->
deekerman 2026-02-19 23:02:20 -05:00
Author
Owner

@lastzero commented on GitHub (Jul 24, 2019):

You are right, it's not that easy to change the model. We might build support for multiple models later. One of the reasons is, that models are trained using different resolutions, so you can't just use a different model as you need thumbs that have the right size and you also need to know how to interpret the labels (linguistics). On top, you need the model in the right format for TensorFlow for C. I had to write some Python for that. Not easy to automate.

@lastzero commented on GitHub (Jul 24, 2019): You are right, it's not that easy to change the model. We might build support for multiple models later. One of the reasons is, that models are trained using different resolutions, so you can't just use a different model as you need thumbs that have the right size and you also need to know how to interpret the labels (linguistics). On top, you need the model in the right format for TensorFlow for C. I had to write some Python for that. Not easy to automate.
Author
Owner

@JimHinson commented on GitHub (Jul 24, 2019):

Would it work to convert all images and thumbnails from the first set, then use the model on the second set?

Thanks!

From: Michael Mayer notifications@github.com
Sent: Wednesday, July 24, 2019 1:12 AM
To: photoprism/photoprism photoprism@noreply.github.com
Cc: Subscribed subscribed@noreply.github.com
Subject: Re: [photoprism/photoprism] Loading a Custom TensorFlow Model (#127)

You are right, it's not that easy to change the model. We might build support for multiple models later. One of the reasons is, that models are trained using different resolutions, so you can't just use a different model as you need thumbs that have the right size and you also need to know how to interpret the labels (linguistics). On top, you need the model in the right format for TensorFlow for C. I had to write some Python for that. Not easy to automate.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fphotoprism%2Fphotoprism%2Fissues%2F127%3Femail_source%3Dnotifications%26email_token%3DAC4XNEOZUAUE2H62YFGS33TQA7QAHA5CNFSM4IGLGT5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2VF7CY%23issuecomment-514482059&data=02%7C01%7C%7C1a5b8790e3e04fb2bac908d70ff5647e%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636995418928099088&sdata=wInfphUYgRKbEtPhYBxe6%2BjqNOX3V%2FfxmCRfWkmHiZM%3D&reserved=0, or mute the threadhttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAC4XNEJQBYA7UN3ZSW4ESRLQA7QAHANCNFSM4IGLGT5A&data=02%7C01%7C%7C1a5b8790e3e04fb2bac908d70ff5647e%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636995418928109093&sdata=S3FYasiE2%2FQ3vc0KAGEewOmqOeiT1Prlfm%2BI64nOwB4%3D&reserved=0.

@JimHinson commented on GitHub (Jul 24, 2019): Would it work to convert all images and thumbnails from the first set, then use the model on the second set? Thanks! From: Michael Mayer <notifications@github.com> Sent: Wednesday, July 24, 2019 1:12 AM To: photoprism/photoprism <photoprism@noreply.github.com> Cc: Subscribed <subscribed@noreply.github.com> Subject: Re: [photoprism/photoprism] Loading a Custom TensorFlow Model (#127) You are right, it's not that easy to change the model. We might build support for multiple models later. One of the reasons is, that models are trained using different resolutions, so you can't just use a different model as you need thumbs that have the right size and you also need to know how to interpret the labels (linguistics). On top, you need the model in the right format for TensorFlow for C. I had to write some Python for that. Not easy to automate. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fphotoprism%2Fphotoprism%2Fissues%2F127%3Femail_source%3Dnotifications%26email_token%3DAC4XNEOZUAUE2H62YFGS33TQA7QAHA5CNFSM4IGLGT5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2VF7CY%23issuecomment-514482059&data=02%7C01%7C%7C1a5b8790e3e04fb2bac908d70ff5647e%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636995418928099088&sdata=wInfphUYgRKbEtPhYBxe6%2BjqNOX3V%2FfxmCRfWkmHiZM%3D&reserved=0>, or mute the thread<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAC4XNEJQBYA7UN3ZSW4ESRLQA7QAHANCNFSM4IGLGT5A&data=02%7C01%7C%7C1a5b8790e3e04fb2bac908d70ff5647e%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636995418928109093&sdata=S3FYasiE2%2FQ3vc0KAGEewOmqOeiT1Prlfm%2BI64nOwB4%3D&reserved=0>.
Author
Owner

@JimHinson commented on GitHub (Jul 24, 2019):

Sorry, let me clarify:
convert all images and thumbnails to the same resolution on both sets, then use the model from the first on the second set?

From: Jim Hinson
Sent: Wednesday, July 24, 2019 8:37 AM
To: photoprism/photoprism reply@reply.github.com; photoprism/photoprism photoprism@noreply.github.com
Cc: Subscribed subscribed@noreply.github.com
Subject: RE: [photoprism/photoprism] Loading a Custom TensorFlow Model (#127)

Would it work to convert all images and thumbnails from the first set, then use the model on the second set?

Thanks!

From: Michael Mayer <notifications@github.commailto:notifications@github.com>
Sent: Wednesday, July 24, 2019 1:12 AM
To: photoprism/photoprism <photoprism@noreply.github.commailto:photoprism@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.commailto:subscribed@noreply.github.com>
Subject: Re: [photoprism/photoprism] Loading a Custom TensorFlow Model (#127)

You are right, it's not that easy to change the model. We might build support for multiple models later. One of the reasons is, that models are trained using different resolutions, so you can't just use a different model as you need thumbs that have the right size and you also need to know how to interpret the labels (linguistics). On top, you need the model in the right format for TensorFlow for C. I had to write some Python for that. Not easy to automate.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fphotoprism%2Fphotoprism%2Fissues%2F127%3Femail_source%3Dnotifications%26email_token%3DAC4XNEOZUAUE2H62YFGS33TQA7QAHA5CNFSM4IGLGT5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2VF7CY%23issuecomment-514482059&data=02%7C01%7C%7C1a5b8790e3e04fb2bac908d70ff5647e%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636995418928099088&sdata=wInfphUYgRKbEtPhYBxe6%2BjqNOX3V%2FfxmCRfWkmHiZM%3D&reserved=0, or mute the threadhttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAC4XNEJQBYA7UN3ZSW4ESRLQA7QAHANCNFSM4IGLGT5A&data=02%7C01%7C%7C1a5b8790e3e04fb2bac908d70ff5647e%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636995418928109093&sdata=S3FYasiE2%2FQ3vc0KAGEewOmqOeiT1Prlfm%2BI64nOwB4%3D&reserved=0.

@JimHinson commented on GitHub (Jul 24, 2019): Sorry, let me clarify: convert all images and thumbnails to the same resolution on both sets, then use the model from the first on the second set? From: Jim Hinson Sent: Wednesday, July 24, 2019 8:37 AM To: photoprism/photoprism <reply@reply.github.com>; photoprism/photoprism <photoprism@noreply.github.com> Cc: Subscribed <subscribed@noreply.github.com> Subject: RE: [photoprism/photoprism] Loading a Custom TensorFlow Model (#127) Would it work to convert all images and thumbnails from the first set, then use the model on the second set? Thanks! From: Michael Mayer <notifications@github.com<mailto:notifications@github.com>> Sent: Wednesday, July 24, 2019 1:12 AM To: photoprism/photoprism <photoprism@noreply.github.com<mailto:photoprism@noreply.github.com>> Cc: Subscribed <subscribed@noreply.github.com<mailto:subscribed@noreply.github.com>> Subject: Re: [photoprism/photoprism] Loading a Custom TensorFlow Model (#127) You are right, it's not that easy to change the model. We might build support for multiple models later. One of the reasons is, that models are trained using different resolutions, so you can't just use a different model as you need thumbs that have the right size and you also need to know how to interpret the labels (linguistics). On top, you need the model in the right format for TensorFlow for C. I had to write some Python for that. Not easy to automate. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fphotoprism%2Fphotoprism%2Fissues%2F127%3Femail_source%3Dnotifications%26email_token%3DAC4XNEOZUAUE2H62YFGS33TQA7QAHA5CNFSM4IGLGT5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2VF7CY%23issuecomment-514482059&data=02%7C01%7C%7C1a5b8790e3e04fb2bac908d70ff5647e%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636995418928099088&sdata=wInfphUYgRKbEtPhYBxe6%2BjqNOX3V%2FfxmCRfWkmHiZM%3D&reserved=0>, or mute the thread<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAC4XNEJQBYA7UN3ZSW4ESRLQA7QAHANCNFSM4IGLGT5A&data=02%7C01%7C%7C1a5b8790e3e04fb2bac908d70ff5647e%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636995418928109093&sdata=S3FYasiE2%2FQ3vc0KAGEewOmqOeiT1Prlfm%2BI64nOwB4%3D&reserved=0>.
Author
Owner

@Kazakuri commented on GitHub (Jul 24, 2019):

It looks to me like the makeTensor function is capable of resizing the thumbnails again after they're created.

The classifyImage function in the indexer will always pass in the tile_224, left_224, or right_224 thumbnails, but the call to imaging.Fill scales and crops the image to the dimensions you give it.

There might be issues with having that scale to a resolution bigger than the thumbnails, but if that function knew the resolution the model was trained at then couldn't it just handle the resizing there when classifying the image?

@Kazakuri commented on GitHub (Jul 24, 2019): It looks to me like the `makeTensor` function is capable of resizing the thumbnails again after they're created. The `classifyImage` function in the indexer will always pass in the `tile_224`, `left_224`, or `right_224` thumbnails, but the call to `imaging.Fill` scales and crops the image to the dimensions you give it. There might be issues with having that scale to a resolution bigger than the thumbnails, but if that function knew the resolution the model was trained at then couldn't it just handle the resizing there when classifying the image?
Author
Owner

@lastzero commented on GitHub (Jul 24, 2019):

We can also create thumbs on-demand, it's just slower and you need to configure out. Right now, the code is "hard wired" to only work with our standard model, but it should be relatively easy to add config options later.

@lastzero commented on GitHub (Jul 24, 2019): We can also create thumbs on-demand, it's just slower and you need to configure out. Right now, the code is "hard wired" to only work with our standard model, but it should be relatively easy to add config options later.
Author
Owner

@Kazakuri commented on GitHub (Jul 26, 2019):

Small info dump update since I had a bit more time to play around with this today. I was able to convert some models fairly successfully using MMdnn.

An example of one of the models:

$ saved_model_cli show --all --dir tf

MetaGraphDef with tag-set: 'train' contains the following SignatureDefs:

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 299, 299, 3)
        name: Input3:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['output'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 6681, 1)
        name: StableSigmoid786596:0
  Method name is: tensorflow/serving/predict

From that output, I changed the tag-set to train instead of photoprism, changed the graph operations to Input3 and StableSigmoid786596 instead of input_1 and predictions/Softmax, and changed the imaging.Fill width and height to 299 instead of 224.

After that I updated the labels.txt and gave it a shot but unfortunately, since the output type of the model is [][][]float32 instead of just [][]float32, the bestLabels function doesn't know how to process it. The inner array only ever seems to be a single element, but I don't know if there's an easy way to get around that limitation in the model.

For what it's worth though, after some minor modifications to bestLabels to accept a [][][]float32 and dump the arrays, it did look like I was getting back some pretty good results.

@Kazakuri commented on GitHub (Jul 26, 2019): Small info dump update since I had a bit more time to play around with this today. I was able to convert some models fairly successfully using [MMdnn](https://github.com/microsoft/MMdnn). An example of one of the models: ```bash $ saved_model_cli show --all --dir tf MetaGraphDef with tag-set: 'train' contains the following SignatureDefs: signature_def['serving_default']: The given SavedModel SignatureDef contains the following input(s): inputs['input'] tensor_info: dtype: DT_FLOAT shape: (-1, 299, 299, 3) name: Input3:0 The given SavedModel SignatureDef contains the following output(s): outputs['output'] tensor_info: dtype: DT_FLOAT shape: (-1, 6681, 1) name: StableSigmoid786596:0 Method name is: tensorflow/serving/predict ``` From that output, I changed the tag-set to `train` instead of `photoprism`, changed the graph operations to `Input3` and `StableSigmoid786596` instead of `input_1` and `predictions/Softmax`, and changed the `imaging.Fill` width and height to 299 instead of 224. After that I updated the `labels.txt` and gave it a shot but unfortunately, since the output type of the model is `[][][]float32` instead of just `[][]float32`, the `bestLabels` function doesn't know how to process it. The inner array only ever seems to be a single element, but I don't know if there's an easy way to get around that limitation in the model. For what it's worth though, after some minor modifications to `bestLabels` to accept a `[][][]float32` and dump the arrays, it did look like I was getting back some pretty good results.
Author
Owner

@lastzero commented on GitHub (Dec 30, 2019):

@Kazakuri Shall we keep this issue open? What exactly should be the outcome / acceptance criteria?

@lastzero commented on GitHub (Dec 30, 2019): @Kazakuri Shall we keep this issue open? What exactly should be the outcome / acceptance criteria?
Author
Owner

@Kazakuri commented on GitHub (Dec 30, 2019):

I guess it sort of depends on what you and the other PhotoPrism devs want to do about custom / multiple models, but things like the changing model graph name and the thumbnail resolution could probably be moved to into config options so that you're not forced to rebuild the code to swap out the model.

Like you said in your initial comment, there are still a handful of complications around getting a pretrained model into the right format that PhotoPrism is expecting, so if it's possible then some documentation in the wiki as a starting point for developers to go off of would be great. It could even end up being a whole other repository that's dedicated to doing model conversions into PhotoPrism's format but that's extremely complex and well outside the scope of PhotoPrism.

@Kazakuri commented on GitHub (Dec 30, 2019): I guess it sort of depends on what you and the other PhotoPrism devs want to do about custom / multiple models, but things like the changing model graph name and the thumbnail resolution could probably be moved to into config options so that you're not forced to rebuild the code to swap out the model. Like you said in your initial comment, there are still a handful of complications around getting a pretrained model into the right format that PhotoPrism is expecting, so if it's possible then some documentation in the wiki as a starting point for developers to go off of would be great. It could even end up being a whole other repository that's dedicated to doing model conversions into PhotoPrism's format but that's extremely complex and well outside the scope of PhotoPrism.
Author
Owner

@lastzero commented on GitHub (Dec 31, 2019):

We need to be careful not to solve problems that not (yet) exist. When adding a new model recently (NSFW detector), I figured the code needed is so different that abstraction would only complicate things. Results even depend on how you resize images and create tensors from them. See https://github.com/photoprism/photoprism/issues/175.

You are right that our docs could be improved and we also want to provide workshops for people who want to learn Go and/or TensorFlow. Maybe we can do this next year when our community becomes more active. Even in Berlin, days have only 24 hours and we - the core team - also need to sleep and eat ;)

Are you interested in contributing?

@lastzero commented on GitHub (Dec 31, 2019): We need to be careful not to solve problems that not (yet) exist. When adding a new model recently (NSFW detector), I figured the code needed is so different that abstraction would only complicate things. Results even depend on how you resize images and create tensors from them. See https://github.com/photoprism/photoprism/issues/175. You are right that our docs could be improved and we also want to provide workshops for people who want to learn Go and/or TensorFlow. Maybe we can do this next year when our community becomes more active. Even in Berlin, days have only 24 hours and we - the core team - also need to sleep and eat ;) Are you interested in contributing?
Author
Owner

@Kazakuri commented on GitHub (Jan 4, 2020):

Ah, that makes sense, yeah. When I was making this issue I did think of NSFW detection, but I was initially looking more towards models for hair color and gender classification or a completely separate use case for classifying digital art using something like Danbooru2018.

I'm by no means a machine learning / TensorFlow expert, but it is something I'm fairly interested in getting more experience with. Similarly, I know enough about Go to hack around a project that uses it, but I wouldn't say I can program well using it.

I haven't had much of a chance to tinker with this project since September but it is still something I'm looking at getting back to playing with, so I would be happy to contribute whatever I can.

@Kazakuri commented on GitHub (Jan 4, 2020): Ah, that makes sense, yeah. When I was making this issue I did think of NSFW detection, but I was initially looking more towards models for hair color and gender classification or a completely separate use case for classifying digital art using something like [Danbooru2018](https://www.gwern.net/Danbooru2018). I'm by no means a machine learning / TensorFlow expert, but it is something I'm fairly interested in getting more experience with. Similarly, I know enough about Go to hack around a project that uses it, but I wouldn't say I can program well using it. I haven't had much of a chance to tinker with this project since September but it is still something I'm looking at getting back to playing with, so I would be happy to contribute whatever I can.
Author
Owner

@issuehunt-oss[bot] commented on GitHub (Apr 26, 2020):

An anonymous user has funded $60.00 to this issue.


@issuehunt-oss[bot] commented on GitHub (Apr 26, 2020): An anonymous user has funded $60.00 to this issue. --- - Submit pull request via [IssueHunt](https://issuehunt.io/repos/119160553/issues/127) to receive this reward. - Want to contribute? Chip in to this issue via [IssueHunt](https://issuehunt.io/repos/119160553/issues/127). - Checkout the [IssueHunt Issue Explorer](https://issuehunt.io/issues) to see more funded issues. - Need help from developers? [Add your repository](https://issuehunt.io/r/new) on IssueHunt to raise funds.
Author
Owner

@mcombalia commented on GitHub (Jul 28, 2020):

It would be cool to have a generic pipeline for data labelling (e.g. see labelstudio backend) and use that generically. New models should have to fit the backend (e.g. generic image is given to the model, the model has to take care of transforming it to its needed resolution and so on...)

Also, that would make the backend not library dependant.

@mcombalia commented on GitHub (Jul 28, 2020): It would be cool to have a generic pipeline for data labelling (e.g. see labelstudio backend) and use that generically. New models should have to fit the backend (e.g. generic image is given to the model, the model has to take care of transforming it to its needed resolution and so on...) Also, that would make the backend not library dependant.
Author
Owner

@jared252016 commented on GitHub (Jul 26, 2021):

I am actually looking forward to PhotoPrism being used for me to train a data set, taking advantage of the API and labels. I too am interested in being able to add custom models similar to gender classification and hair/race.

Alternatively I'm looking at crawling WebDAV and running it through the API. A simple way to achieve this for the time being may be to just allow a on-import command to be run through CLI, passing in some useful information such as the URL of the image. Processing could be done remotely as well, so I could use my GPU instead of just a CPU that the VM has.

Edit: See my issue https://github.com/photoprism/photoprism/issues/1450. I think it should be a fairly simple way to achieve what you are wanting to do. API call to add the label (NYI in the client) after processing.

@jared252016 commented on GitHub (Jul 26, 2021): I am actually looking forward to PhotoPrism being used for me to train a data set, taking advantage of the API and labels. I too am interested in being able to add custom models similar to gender classification and hair/race. Alternatively I'm looking at crawling WebDAV and running it through the API. A simple way to achieve this for the time being may be to just allow a on-import command to be run through CLI, passing in some useful information such as the URL of the image. Processing could be done remotely as well, so I could use my GPU instead of just a CPU that the VM has. Edit: See my issue https://github.com/photoprism/photoprism/issues/1450. I think it should be a fairly simple way to achieve what you are wanting to do. API call to add the label (NYI in the client) after processing.
Author
Owner

@mxwi commented on GitHub (Jul 27, 2023):

hey there,

just wanted to ask if there is any new information on this. I like photoprism very much - but the object clasification is not really working for me. I got lots of false-positives and not recognised pictures.

We have many many pictures of cats ;) - there are clearly visible and to be identified as cats, but not recognised by the model.

@mxwi commented on GitHub (Jul 27, 2023): hey there, just wanted to ask if there is any new information on this. I like photoprism very much - but the object clasification is not really working for me. I got lots of false-positives and not recognised pictures. We have many many pictures of cats ;) - there are clearly visible and to be identified as cats, but not recognised by the model.
Author
Owner

@graciousgrey commented on GitHub (Jul 27, 2023):

Hey 👋

Thank you for your kind feedback!

Unfortunately, we haven't had time to work on this yet. Our roadmap shows what tasks are currently in progress and what will be implemented next.

We kindly ask everyone to use Github Discussions or our Community Chat instead of Github Issues for general questions. Thank you very much ❤️

@graciousgrey commented on GitHub (Jul 27, 2023): Hey :wave: Thank you for your kind feedback! Unfortunately, we haven't had time to work on this yet. Our [roadmap](https://github.com/orgs/photoprism/projects/5) shows what tasks are currently in progress and what will be implemented next. We kindly ask everyone to use [Github Discussions](https://github.com/photoprism/photoprism/discussions) or our [Community Chat](https://link.photoprism.app/chat) instead of Github Issues for general questions. Thank you very much :heart:
Author
Owner

@yggdrasil75 commented on GitHub (Sep 28, 2023):

https://www.reddit.com/r/photoprism/comments/10deccq/just_installed_photoprism_cant_stop_laughing_at/
at the very least, can the nasnet model used be made to be a higher resolution option? cause currently its tagging like trash. just tried this and cant get any decent tags, and I am not manually tagging my thousands of images.
allow the user to set nasnet-c 1@226 (whatever you use by default) or nasnet-a 6@4032. I would love to use clip instead, particularly vit-l/14 because I have a beast of a home server, but understand that the smallest model possible is used for weaker computers. I get not putting a lot of effort in for using different types of models, like resnet, vit, and so on, but nasnet has better models than the one included.

@yggdrasil75 commented on GitHub (Sep 28, 2023): https://www.reddit.com/r/photoprism/comments/10deccq/just_installed_photoprism_cant_stop_laughing_at/ at the very least, can the nasnet model used be made to be a higher resolution option? cause currently its tagging like trash. just tried this and cant get any decent tags, and I am not manually tagging my thousands of images. allow the user to set nasnet-c 1@226 (whatever you use by default) or nasnet-a 6@4032. I would love to use clip instead, particularly vit-l/14 because I have a beast of a home server, but understand that the smallest model possible is used for weaker computers. I get not putting a lot of effort in for using different types of models, like resnet, vit, and so on, but nasnet has better models than the one included.
Author
Owner

@lastzero commented on GitHub (Sep 28, 2023):

Whatever your opinion is, it was good enough for many others to build on our work. A larger model (with higher resolution) is possible, but significantly larger and slower. Our next releases will focus on OIDC, 2FA, and Batch Edit. When that is done, we might be able to take a look at this. It's been a while since we developed our image classification, so most likely there are better models available now.

@lastzero commented on GitHub (Sep 28, 2023): Whatever your opinion is, it was good enough for many others to build on our work. A larger model (with higher resolution) is possible, but significantly larger and slower. Our next releases will focus on OIDC, 2FA, and Batch Edit. When that is done, we might be able to take a look at this. It's been a while since we developed our image classification, so most likely there are better models available now.
Author
Owner

@aa333 commented on GitHub (Jul 4, 2024):

ML instruments and frameworks are growing explosively. I am not expecting a small core team to stay on top of it, you'd need a separate crew of DS folks just to keep up. It doesn't seem to be feasible with your business model.

This ticket seems to describe the best course of action - make a good abstracted API, or at least ease the model swap and allow users to experiment in this area. I have some experience building media pipelines and unifying wildly different models (in client-side JS, with 30FPS and dynamic resizing), and I know how tricky can it be. So let the ML community jump in and help. That would make Photoprism massively more powerful for all of us.

@aa333 commented on GitHub (Jul 4, 2024): ML instruments and frameworks are growing explosively. I am not expecting a small core team to stay on top of it, you'd need a separate crew of DS folks just to keep up. It doesn't seem to be feasible with your business model. This ticket seems to describe the best course of action - make a good abstracted API, or at least ease the model swap and allow users to experiment in this area. I have some experience building media pipelines and unifying wildly different models (in client-side JS, with 30FPS and dynamic resizing), and I know how tricky can it be. So let the ML community jump in and help. That would make Photoprism massively more powerful for all of us.
Author
Owner

@lastzero commented on GitHub (Jul 4, 2024):

@aa333 That is our plan. At the moment, however, we are still working on OpenID Connect (OIDC). Once it has been released, we will add support for additional/external models as one of the next steps, e.g.:

Any help with that would be much appreciated!

@lastzero commented on GitHub (Jul 4, 2024): @aa333 That is our plan. At the moment, however, we are still [working on OpenID Connect (OIDC)](https://github.com/photoprism/photoprism/issues/782#issuecomment-2203104905). Once it has been released, we will add support for additional/external models as one of the next steps, e.g.: - https://github.com/photoprism/photoprism/discussions/3419#discussioncomment-9638951 Any help with that would be much appreciated!
Author
Owner

@lastzero commented on GitHub (Apr 6, 2025):

The above changes allow to configure the computer vision models through an (optional) vision.yml configuration file. However, since our current label model seems to be quite special when it comes to its input and output parameters, I wasn't able to simply replace it with another model, i.e. it will take some work to figure things out. Any help with this would be much appreciated!

@lastzero commented on GitHub (Apr 6, 2025): The above changes allow to configure the computer vision models through an (optional) `vision.yml` configuration file. However, since our current label model seems to be quite special when it comes to its input and output parameters, I wasn't able to simply replace it with another model, i.e. it will take some work to figure things out. Any help with this would be much appreciated!
Author
Owner

@lastzero commented on GitHub (Apr 6, 2025):

Since the upgrade from TensorFlow v1.15.2 to v2.18.0 is pretty much complete, it should generally be possible to download saved TensorFlow models and then put them in a subdirectory of the assets path (just like the other models).

The basic idea is that you can then use them instead of the default model by changing the configuration in config/vision.yml:

Labels:
- Name: Nasnet
  Resolution: 224
  Tags:
  - photoprism
Thresholds:
  Confidence: 10

Unfortunately, the input/output parameters of the existing model are not standard as it seems, for example it uses input_1 instead of just input for the input:

github.com/photoprism/photoprism@35e9294d87/internal/ai/classify/model.go (L94-L99)

In addition, other image classification models may require a different thumbnail resolution, such as 384x384 or 480x480 pixels instead of 224x224 pixels. Simply changing the resolution value in the vision.yml file will not automatically pass higher resolution images to the model, since the thumbnails are generated before the inference is run. So for this to work, the indexer should ask the vision package for the right size before passing the images! 🗯

@lastzero commented on GitHub (Apr 6, 2025): Since the [upgrade from TensorFlow v1.15.2 to v2.18.0](https://github.com/photoprism/photoprism/issues/222 ) is pretty much complete, it should generally be possible to [download saved TensorFlow models](https://www.kaggle.com/models/google/efficientnet-v2) and then put them in a subdirectory of the assets path (just like the other models). The basic idea is that you can then use them instead of the default model by changing the configuration in `config/vision.yml`: ```yaml Labels: - Name: Nasnet Resolution: 224 Tags: - photoprism Thresholds: Confidence: 10 ``` Unfortunately, the **input/output parameters** of the existing model are not standard as it seems, for example it uses `input_1` instead of just `input` for the input: https://github.com/photoprism/photoprism/blob/35e9294d878a8ad91278fc652e3b44a662ae9d88/internal/ai/classify/model.go#L94-L99 In addition, other image classification models may require a **different thumbnail resolution**, such as 384x384 or 480x480 pixels instead of 224x224 pixels. Simply changing the resolution value in the `vision.yml` file will not automatically pass higher resolution images to the model, since the thumbnails are generated before the inference is run. So for this to work, [the indexer should ask](https://github.com/photoprism/photoprism/blob/develop/internal/photoprism/index_labels.go) the [`vision` package for the right size](https://github.com/photoprism/photoprism/tree/develop/internal/ai/vision) before [passing the images](https://github.com/photoprism/photoprism/blob/develop/internal/ai/vision/labels.go)! 🗯
Author
Owner

@lastzero commented on GitHub (Apr 7, 2025):

Here is related documentation that explains how to inspect saved models and what common input/output signatures are:

Based on this, it unfortunately seems that image data needs to be transformed in different ways for use with TensorFlow, as some models expect pixel values to be floating point numbers in the range [-1, 1] and others [0, 1]).

Also note that when working with image classification models, you usually need a list of the labels they were trained with, so you know what their output values mean (depending on how many output values they have, e.g. 1k or 21k):

Since these terms seem to be pretty standardized, they are often not included in the model downloads, i.e. you have to get them separately and add them as labels.txt to the model directory, e.g. in /assets/vision. It is not possible to use the models otherwise.

@lastzero commented on GitHub (Apr 7, 2025): Here is **related documentation** that explains how to inspect saved models and what common input/output signatures are: - https://www.tensorflow.org/guide/saved_model#details_of_the_savedmodel_command_line_interface - https://www.tensorflow.org/hub/common_signatures/images#input - https://www.tensorflow.org/hub/common_signatures/images#image_classification - https://www.tensorflow.org/hub/reusable_saved_models Based on this, it unfortunately seems that **image data needs to be transformed in different ways** for use with TensorFlow, as some models expect pixel values to be [floating point numbers in the range [-1, 1]](https://www.kaggle.com/models/spsayakpaul/vision-transformer/tensorFlow2/vit-b16-classification) and others [[0, 1]](https://www.kaggle.com/models/tensorflow/resnet-50)). Also note that when working with image classification models, you usually need a list of the labels they were trained with, so you know what their output values mean (depending on how many output values they have, e.g. 1k or 21k): - https://storage.googleapis.com/download.tensorflow.org/data/ImageNetLabels.txt - https://storage.googleapis.com/bit_models/imagenet21k_wordnet_lemmas.txt Since these terms seem to be pretty standardized, they are often not included in the model downloads, i.e. you have to get them separately and add them as `labels.txt` to the model directory, e.g. in `/assets/vision`. It is not possible to use the models otherwise.
Author
Owner

@raystlin commented on GitHub (Apr 7, 2025):

Do you mind if I have a go at this or do you plan to tackle it yourself?

@raystlin commented on GitHub (Apr 7, 2025): Do you mind if I have a go at this or do you plan to tackle it yourself?
Author
Owner

@lastzero commented on GitHub (Apr 7, 2025):

Please go ahead! I wish I had the time to keep working on this, but there's just too many other things waiting for attention. If you need anything, like additional settings/model parameters, just let me know :)

@lastzero commented on GitHub (Apr 7, 2025): Please go ahead! I wish I had the time to keep working on this, but there's just too many other things waiting for attention. If you need anything, like additional settings/model parameters, just let me know :)
Author
Owner

@lastzero commented on GitHub (Apr 7, 2025):

A few more notes on this, so that my findings and the current state are publicly documented:

(a) My suggestion would be to store additional models in a subdirectory of the assets path, e.g. /assets/vision, and then reference them with e.g. vision/resnet in the vision.yml model settings. When I experimented with additional models directly in assets (where the others are), it got way too crowded and I had to repeatedly update my .gitignore file not to commit them.

(b) I would also recommend to start with models that expect the same thumbnail resolution, i.e. 224x224, as this reduces test/development complexity and makes the results of new models more comparable to existing ones. We can later refactor the code to allow passing higher resolution thumbs to the models (or have TensorFlow generate batches from higher resolution images, since the thumbs are square, so if you have a 16:9 image, you want to use multiple thumbs that include different areas of the photo).

(c) Additional/improved functionality, e.g. for generating tensors from images, can be added to the /internal/ai/tensorflow package. I created it yesterday to remove duplicate code in the individual model packages, but there's probably much more room for refactoring as it becomes clear how best to handle/abstract different models. The test coverage is also not good yet.

@lastzero commented on GitHub (Apr 7, 2025): A few more notes on this, so that my findings and the current state are publicly documented: (a) My suggestion would be to store additional models in a subdirectory of the assets path, e.g. `/assets/vision`, and then reference them with e.g. `vision/resnet` in the `vision.yml` model settings. When I experimented with additional models directly in assets (where the others are), it got way too crowded and I had to repeatedly update my `.gitignore` file not to commit them. (b) I would also recommend to start with models that expect the [same thumbnail resolution, i.e. 224x224](https://docs.photoprism.app/developer-guide/media/thumbnails/#standard-sizes), as this reduces test/development complexity and makes the results of new models more comparable to existing ones. We can later refactor the code to allow passing higher resolution thumbs to the models (or have TensorFlow generate batches from higher resolution images, since the thumbs are square, so if you have a 16:9 image, you want to use multiple thumbs that include different areas of the photo). (c) Additional/improved functionality, e.g. for generating tensors from images, can be added to the `/internal/ai/tensorflow` package. I created it yesterday to remove duplicate code in the individual model packages, but there's probably much more room for refactoring as it becomes clear how best to handle/abstract different models. The test coverage is also not good yet.
Author
Owner

@lastzero commented on GitHub (Apr 7, 2025):

Some potentially more accurate/powerful image classification models and the original imagenet 1k/21k labels have been uploaded to dl.photoprism.app, so they are easy to find and install from there (not all of them use 224x224 thumbnails, so you need to check this with the saved_model_cli show command that is documented on tensorflow.org, or read the description on kaggle.com):

Note that PhotoPrism currently uses a modified list of imagenet1k labels, as we didn't have a mapping configuration for the label names you see in the user interface initially, i.e. we edited some terms directly in /assets/nasnet/labels.txt. So if you're using an imagenet1k model, it's better to use our existing labels.txt in /assets/nasnet or the nasnet.zip archive with it, as it's been tested and works for our needs.

@lastzero commented on GitHub (Apr 7, 2025): Some potentially more accurate/powerful image classification models and the **original** imagenet 1k/21k labels have been [uploaded to dl.photoprism.app](https://dl.photoprism.app/tensorflow/vision/), so they are easy to find and install from there (not all of them use 224x224 thumbnails, so you need to check this with the `saved_model_cli show` command that is [documented on tensorflow.org](https://www.tensorflow.org/guide/saved_model#details_of_the_savedmodel_command_line_interface), or read the [description on kaggle.com](https://www.kaggle.com/models/)): - https://dl.photoprism.app/tensorflow/vision/ Note that PhotoPrism currently uses a **modified** list of imagenet1k labels, as we didn't have a mapping configuration for the label names you see in the user interface initially, i.e. we edited some terms directly in `/assets/nasnet/labels.txt`. So if you're using an imagenet1k model, it's better to use our existing `labels.txt` in `/assets/nasnet` or the [nasnet.zip](https://dl.photoprism.app/tensorflow/nasnet.zip) archive with it, as it's been tested and works for our needs.
Author
Owner

@raystlin commented on GitHub (Apr 8, 2025):

My idea is to unmarshal the file saved_model.pb (graft includes the protobuffer classes) and look for the model information there.
I know I can have the tag names, the input and output names, expected input resolution and the expected amount of output tags, but I have to investigate if also the output range is available.

@raystlin commented on GitHub (Apr 8, 2025): My idea is to unmarshal the file `saved_model.pb` (graft includes the protobuffer classes) and look for the model information there. I know I can have the tag names, the input and output names, expected input resolution and the expected amount of output tags, but I have to investigate if also the output range is available.
Author
Owner

@raystlin commented on GitHub (Apr 11, 2025):

I have investigated the uploaded models and I think the following additional input parameters are needed to overwrite or set parameters when guessing is not possible or fails:

Inputs:

  1. The name of the input layer and index of output to use. For models with signatures I can retrieve this information from the signatures, but models created using TF 1 (and sometimes new models as well) lack these signatures.
  2. A scale factor to apply to the input parameters. Transformer model accepts [-1, 1] instead of [0, 1] and I assume all the range variations can be solved by a linear transform.

Outputs:

  1. The name of the output layer and index of the output to use. Same problem with signatures.
  2. A flag to mark if the model returns probabilities or logits, that have to be transformed to probabilities using a softmax function.

On my repository branch feature/custom-tf-model-222 I have added the structures I try to deduce from the models:

// Input description for a photo input for a model
type PhotoInput struct {
        Name        string
        OutputIndex int
        Height      int64
        Width       int64
        Channels    int64
}

// The output expected for a model
type ModelOutput struct {
        Name          string
        OutputIndex   int
        NumOutputs    int64
        OutputsLogits bool
}

// The meta information for the model
type ModelInfo struct {
        TFVersion string
        Tags      []string
        Input     *PhotoInput
        Output    *ModelOutput
}

A Scale parameter should be added to PhotoInput, but I will do it later because I have to test the model first to be sure.
I could load and run all the models except inception-v3-tensorflow2-tf2-preview-classification-v4.tar.gz, where I could not deduce the layers and shapes.

@raystlin commented on GitHub (Apr 11, 2025): I have investigated the uploaded models and I think the following additional input parameters are needed to overwrite or set parameters when guessing is not possible or fails: **Inputs:** 1. The name of the input layer and index of output to use. For models with signatures I can retrieve this information from the signatures, but models created using TF 1 (and sometimes new models as well) lack these signatures. 2. A scale factor to apply to the input parameters. Transformer model accepts [-1, 1] instead of [0, 1] and I assume all the range variations can be solved by a linear transform. **Outputs:** 1. The name of the output layer and index of the output to use. Same problem with signatures. 2. A flag to mark if the model returns probabilities or `logits`, that have to be transformed to probabilities using a `softmax` function. On my repository branch `feature/custom-tf-model-222` I have added the structures I try to deduce from the models: ```go // Input description for a photo input for a model type PhotoInput struct { Name string OutputIndex int Height int64 Width int64 Channels int64 } // The output expected for a model type ModelOutput struct { Name string OutputIndex int NumOutputs int64 OutputsLogits bool } // The meta information for the model type ModelInfo struct { TFVersion string Tags []string Input *PhotoInput Output *ModelOutput } ``` A `Scale` parameter should be added to `PhotoInput`, but I will do it later because I have to test the model first to be sure. I could load and run all the models except `inception-v3-tensorflow2-tf2-preview-classification-v4.tar.gz`, where I could not deduce the layers and shapes.
Author
Owner

@lastzero commented on GitHub (Apr 13, 2025):

@raystlin Wow, that looks great! 👍

In the meantime, I've further refined the (remote service) model configuration in the new vision.yml file, so we can release what we already have and start testing it with the external photoprism/photoprism-vision service we've been working on. We also added CLI commands to check the model configuration (photoprism vision ls) and to run selected models on a subset of pictures by specifying a search filter and the model types, e.g.:

photoprism vision run --models=caption year:2025

I would suggest adding the additional input and output options you propose in a later release, so as not to delay the next one. Have you only added the configuration options in your branch, or are they already working, i.e. can other models be integrated with them? If not, would you like to continue working on this and get it to a usable/useful state so that we can release it?

Note that you can also reach us via email and on Element/Matrix if you have any questions or feedback, and we'd be happy to invite you to our Contributors Chat! 💬

@lastzero commented on GitHub (Apr 13, 2025): @raystlin Wow, that looks great! 👍 In the meantime, I've further refined the (remote service) model configuration in the [new `vision.yml` file](https://github.com/photoprism/photoprism/blob/develop/internal/ai/vision/testdata/vision.yml), so we can release what we already have and start testing it with the [external `photoprism/photoprism-vision` service](https://github.com/photoprism/photoprism-vision) we've been working on. We also added CLI commands to check the model configuration (`photoprism vision ls`) and to run selected models on a subset of pictures by specifying a search filter and the model types, e.g.: ``` photoprism vision run --models=caption year:2025 ``` I would suggest adding the additional input and output options you propose in a later release, so as not to delay the next one. Have you only added the configuration options in your branch, or are they already working, i.e. can other models be integrated with them? If not, would you like to continue working on this and get it to a usable/useful state so that we can release it? Note that you can also reach us via email and on Element/Matrix if you have any questions or feedback, and we'd be happy to invite you to our Contributors Chat! 💬
Author
Owner

@raystlin commented on GitHub (Apr 13, 2025):

I think the coding is complete, but I feel a lot of testing remains to be done because I did unit testing for models but did not perform large tests on the whole system (i.e. photoprism vision run).

My commits are lacking, at least, the following things:

  • When labels are not 1k, the rules do not apply to them. I don't really know how to tackle it... should we have rules for them as we have for 1k or should the user provide them somehow. By the way, I added 21k labels to nasnet in download-nasnet.sh to have them as the default place to look for them if the output size matches.
  • Thumbnail resizing (I mean, thumbnail creation with model based resolution) has not been solved in my commits. I don't know if photoprism is already doing it.
  • Facenet is untouched, but I have many things to discuss about it...

That being said, I can wait for the release and merge the commits to come until then or do the PR now. Whatever you prefer.

@raystlin commented on GitHub (Apr 13, 2025): I think the coding is complete, but I feel a lot of testing remains to be done because I did unit testing for models but did not perform large tests on the whole system (i.e. `photoprism vision run`). My commits are lacking, at least, the following things: * When labels are not 1k, the rules do not apply to them. I don't really know how to tackle it... should we have rules for them as we have for 1k or should the user provide them somehow. By the way, I added 21k labels to `nasnet` in `download-nasnet.sh` to have them as the default place to look for them if the output size matches. * Thumbnail resizing (I mean, thumbnail creation with model based resolution) has not been solved in my commits. I don't know if `photoprism` is already doing it. * Facenet is untouched, but I have many things to discuss about it... That being said, I can wait for the release and merge the commits to come until then or do the PR now. Whatever you prefer.
Author
Owner

@raystlin commented on GitHub (May 12, 2025):

It's been a while, but I have some news regarding this issue.

  • Regarding classification models: I have tested them using photoprism vision run against flikr 30k photoset and so far I think the results are fine. Changing the model can lead to a huge increase in the processing times of the images (transformer and efficientnet_m are much much slower than the others) but the classification tags provided by the new model are fine.
  • Regarding nsfw models: I did the minimum amount of changes to have a common structure between all the models but this model is so specific that I think the way it is now is not more useful than having a fixed model. However, I feel this feature could be much more useful understood as a general filter... Let me explain it with an example: I do not have nsfw images, but (by mistake) I sent all the images on my mobile phone to photoprism and now I have thousands of meme images I would like to filter. NSFW filter is rather useless for me, but a meme filter would be perfect, however, output categories are hardcoded... What I would propose would be to change these output categories to generic Yes/No (True/False or whatever) via labels.txt file served along with the model. What do you think about this approach?
  • Regarding face models: Making them changeable would mean a really deep change in the way embeddings are defined and used... It could be done but I think a deep discussion is needed here.
@raystlin commented on GitHub (May 12, 2025): It's been a while, but I have some news regarding this issue. * **Regarding classification models**: I have tested them using `photoprism vision run` against flikr 30k photoset and so far I think the results are fine. Changing the model can lead to a huge increase in the processing times of the images (`transformer` and `efficientnet_m` are much much slower than the others) but the classification tags provided by the new model are fine. * **Regarding nsfw models**: I did the minimum amount of changes to have a common structure between all the models but this model is so specific that I think the way it is now is not more useful than having a fixed model. However, I feel this feature could be much more useful understood as a general filter... Let me explain it with an example: I do not have nsfw images, but (by mistake) I sent all the images on my mobile phone to `photoprism` and now I have thousands of meme images I would like to filter. NSFW filter is rather useless for me, but a meme filter would be perfect, however, output categories are hardcoded... What I would propose would be to change these output categories to generic Yes/No (True/False or whatever) via `labels.txt` file served along with the model. What do you think about this approach? * **Regarding face models**: Making them changeable would mean a really deep change in the way embeddings are defined and used... It could be done but I think a deep discussion is needed here.
Author
Owner

@lastzero commented on GitHub (May 13, 2025):

  • Regarding classification models: I have tested them using photoprism vision run against flikr 30k photoset and so far I think the results are fine. Changing the model can lead to a huge increase in the processing times of the images (transformer and efficientnet_m are much much slower than the others) but the classification tags provided by the new model are fine.

That's great news! 🎉

We've also made progress with the dedicated vision service that runs on Python. It now supports additional models and has a more capable API. For example, it now accepts data URLs, and the model to use can be specified in the request data:

Please note that the photoprism/vision image on Docker Hub has not yet been updated, so you still need to build from source to test these changes.

As you mentioned, the downside is that such advanced models are often much slower, i.e. they should run asynchronously or be used with the photoprism vision run command as needed.

I need to work on some completely unrelated tasks right now, but will be happy to implement asynchronous indexing when I get a chance. That way, users can start browsing their pictures before the labels, faces, and captions have all been detected or generated.

  • Regarding nsfw models: I did the minimum amount of changes to have a common structure between all the models but this model is so specific that I think the way it is now is not more useful than having a fixed model. However, I feel this feature could be much more useful understood as a general filter... Let me explain it with an example: I do not have nsfw images, but (by mistake) I sent all the images on my mobile phone to photoprism and now I have thousands of meme images I would like to filter. NSFW filter is rather useless for me, but a meme filter would be perfect, however, output categories are hardcoded... What I would propose would be to change these output categories to generic Yes/No (True/False or whatever) via labels.txt file served along with the model. What do you think about this approach?

That seems possible, though general filters might represent a slightly different use case:

  • NSFW detection was added when our project became popular enough for people to frequently upload pr0n to our public demo instances.
  • As a result, the NSFW detection was deeply integrated with the specific goal of preventing such uploads.
  • The option to flag pictures as "private" is just bonus functionality. IMHO, it doesn't work very reliably since the thresholds are different from those for uploads, and it's probably not used very often.

So, I guess a general filter may/must be less aggressive and offer different capabilities and configuration options? Should all filters be able to block web uploads?

  • Regarding face models: Making them changeable would mean a really deep change in the way embeddings are defined and used... It could be done but I think a deep discussion is needed here.

That's absolutely true, and we would only switch to a different face embedding model if it offered advantages over the current model, which is unclear at this point:

  • Some of the known issues with face recognition could be caused by our vector/embeddings library, which I wrote before other solutions were available.
  • So it's possible that we can keep the model if the problems can be solved by doing the math in Qdrant or MariaDB.
  • However, if not, it would be helpful for us, as developers, to configure and test other models that could be used as the standard in future releases.

Do you happen to know of any models or Go libraries that can more reliably detect faces, i.e. the step before generating an embedding? We currently use https://github.com/esimov/pigo, but we may need a more capable solution unless I haven't used or configured it properly...

@lastzero commented on GitHub (May 13, 2025): > * **Regarding classification models**: I have tested them using `photoprism vision run` against flikr 30k photoset and so far I think the results are fine. Changing the model can lead to a huge increase in the processing times of the images (`transformer` and `efficientnet_m` are much much slower than the others) but the classification tags provided by the new model are fine. That's great news! 🎉 We've also made progress with the dedicated vision service that runs on Python. It now supports additional models and has a more capable API. For example, it now accepts data URLs, and the model to use can be specified in the request data: - https://github.com/photoprism/photoprism-vision/pull/5 - https://github.com/photoprism/photoprism-vision/pull/10 - https://github.com/photoprism/photoprism-vision/issues/1 Please note that the [`photoprism/vision`](https://hub.docker.com/r/photoprism/vision) image on Docker Hub has not yet been updated, so you still need to build from source to test these changes. As you mentioned, the downside is that such advanced models are often much slower, i.e. they should run asynchronously or be used with the `photoprism vision run` command as needed. I need to work on some completely unrelated tasks right now, but will be happy to implement asynchronous indexing when I get a chance. That way, users can start browsing their pictures before the labels, faces, and captions have all been detected or generated. > * **Regarding nsfw models**: I did the minimum amount of changes to have a common structure between all the models but this model is so specific that I think the way it is now is not more useful than having a fixed model. However, I feel this feature could be much more useful understood as a general filter... Let me explain it with an example: I do not have nsfw images, but (by mistake) I sent all the images on my mobile phone to `photoprism` and now I have thousands of meme images I would like to filter. NSFW filter is rather useless for me, but a meme filter would be perfect, however, output categories are hardcoded... What I would propose would be to change these output categories to generic Yes/No (True/False or whatever) via `labels.txt` file served along with the model. What do you think about this approach? That seems possible, though general filters might represent a slightly different use case: - NSFW detection was added when our project became popular enough for people to frequently upload pr0n to our public demo instances. - As a result, the NSFW detection was deeply integrated with the specific goal of preventing such uploads. - The option to flag pictures as "private" is just bonus functionality. IMHO, it doesn't work very reliably since the thresholds are different from those for uploads, and it's probably not used very often. So, I guess a general filter may/must be less aggressive and offer different capabilities and configuration options? Should all filters be able to block web uploads? > * **Regarding face models**: Making them changeable would mean a really deep change in the way embeddings are defined and used... It could be done but I think a deep discussion is needed here. That's absolutely true, and we would only switch to a different face embedding model if it offered advantages over the current model, which is unclear at this point: - Some of the [known issues with face recognition](https://docs.photoprism.app/known-issues/#face-recognition) could be caused by our vector/embeddings library, which I wrote before other solutions were available. - So it's possible that we can keep the model if the problems can be solved by doing the math in Qdrant or MariaDB. - However, if not, it would be helpful for us, as developers, to configure and test other models that could be used as the standard in future releases. Do you happen to know of any models or Go libraries that can more reliably detect faces, i.e. the step before generating an embedding? We currently use https://github.com/esimov/pigo, but we may need a more capable solution unless I haven't used or configured it properly...
Author
Owner

@raystlin commented on GitHub (May 13, 2025):

The only library I have seen in go to perform face detection is pigo. However, if we want to change and use an AI model, there is a project called insightface that is focused on face detection/recognition. I think the models they use (https://github.com/deepinsight/insightface/tree/master/model_zoo) are onnx, but we can probably find them in kaggle as tf 2.0.

Regarding the other points, I understood that Faces and NSFW are going to remain, at least for now, as they are. So, the only point left for me to solve is the labels_21k rules. I can do the PR without it, or spend some time on it (it does not seem to be a quick task 😄)

@raystlin commented on GitHub (May 13, 2025): The only library I have seen in go to perform face detection is `pigo`. However, if we want to change and use an AI model, there is a project called `insightface` that is focused on face detection/recognition. I think the models they use (https://github.com/deepinsight/insightface/tree/master/model_zoo) are `onnx`, but we can probably find them in `kaggle` as tf 2.0. Regarding the other points, I understood that Faces and NSFW are going to remain, at least for now, as they are. So, the only point left for me to solve is the `labels_21k` rules. I can do the PR without it, or spend some time on it (it does not seem to be a quick task 😄)
Author
Owner

@lastzero commented on GitHub (May 14, 2025):

If the face model requires us to run Python code, then it's probably not suitable for direct integration with our Go backend, but could instead be added to the dedicated photoprism-vision service so that it can be (optionally) used via a REST API. Even though it's probably slower, some users might prefer it for better face detection and/or recognition.

As for NSFW detection, it would be great to replace the current model with a better one that isn't significantly slower to reduce the number of false positives and negatives. However, I realize that it may take some time to find and test one, so I wouldn't expect this to be a quick win. Since the dedicated photoprism vision service should now also include a (different) NSFW model, we could try that for comparison as a next step (I haven't had time to do that yet).

If 21k image classification isn't a quick win either (as it sounds), then how about merging your existing improvements first and doing those models later (depending on your time and motivation)?

@lastzero commented on GitHub (May 14, 2025): If the face model requires us to run Python code, then it's probably not suitable for direct integration with our Go backend, but could instead be added to the dedicated photoprism-vision service so that it can be (optionally) used via a REST API. Even though it's probably slower, some users might prefer it for better face detection and/or recognition. As for NSFW detection, it would be great to replace the current model with a better one that isn't significantly slower to reduce the number of false positives and negatives. However, I realize that it may take some time to find and test one, so I wouldn't expect this to be a quick win. Since the dedicated photoprism vision service should now also include a (different) NSFW model, we could try that for comparison as a next step (I haven't had time to do that yet). If 21k image classification isn't a quick win either (as it sounds), then how about merging your existing improvements first and doing those models later (depending on your time and motivation)?
Author
Owner

@raystlin commented on GitHub (May 14, 2025):

Sure, by the end of the day I will do the PR. I keep the task of doing 21k label rules and, as a side task, a version of face detection using AI in go.
If you find a suitable NSFW model, I can do the merging as well.

@raystlin commented on GitHub (May 14, 2025): Sure, by the end of the day I will do the PR. I keep the task of doing 21k label rules and, as a side task, a version of face detection using AI in go. If you find a suitable NSFW model, I can do the merging as well.
Author
Owner

@lastzero commented on GitHub (May 17, 2025):

@raystlin Thanks a lot! I'll review and merge your pull request as soon as possible 👍

Having thought about it again, I believe it would be easier to generate CLIP embeddings and implement a search with those than to generate 21,000 different labels suitable only for regular text-based searches?

Another option for improving regular text-based searches could be to extract text from images using OCR. While I am not aware of any OCR models available on Kaggle, it might be possible to use generic multimodal models instead?

Would your improved TensorFlow integration/configuration allow any of this without requiring major changes?

@lastzero commented on GitHub (May 17, 2025): @raystlin Thanks a lot! I'll review and merge your pull request as soon as possible 👍 Having thought about it again, I believe it would be easier to generate CLIP embeddings and implement a search with those than to generate 21,000 different labels suitable only for regular text-based searches? Another option for improving regular text-based searches could be to extract text from images using OCR. While I am not aware of any OCR models available on Kaggle, it might be possible to use generic multimodal models instead? Would your improved TensorFlow integration/configuration allow any of this without requiring major changes?
Author
Owner

@raystlin commented on GitHub (May 17, 2025):

I have not worked with CLIP on go before. I think the change would not be huge, but I need to inspect and try the model to confirm it... @lastzero Could you upload a CLIP model you know that runs fast to do the testing?
I really like the idea of generating embeddings instead of labels, but I have one concern with CLIP: performance.
Following the path of generating embeddings, another idea could be to use word2vec on the classification result, which would allow us to use almost the same layer of models we are using plus a little bit of additional processing on the labels and queries. The data flow of the application would be slightly different:

classification + embeddings

Initialization

labels.txt -> word2vec -> embeddings -> vector_db

Indexing

image -> classification -> labels -> database

Search

query -> word2vec -> embeddings -> vector_db -> labels -> database -> images

CLIP

Initialization
Nothing

Indexing

image -> CLIP -> embeddings -> vector_db

Search

query -> CLIP -> embeddings -> vector_db -> images

I feel word2vec would be faster... But I don't know if it would make sense to have both flows at the same time.

@raystlin commented on GitHub (May 17, 2025): I have not worked with CLIP on go before. I think the change would not be huge, but I need to inspect and try the model to confirm it... @lastzero Could you upload a CLIP model you know that runs fast to do the testing? I really like the idea of generating embeddings instead of labels, but I have one concern with CLIP: performance. Following the path of generating embeddings, another idea could be to use `word2vec` on the classification result, which would allow us to use almost the same layer of models we are using plus a little bit of additional processing on the labels and queries. The data flow of the application would be slightly different: ## classification + embeddings **Initialization** ``` labels.txt -> word2vec -> embeddings -> vector_db ``` **Indexing** ``` image -> classification -> labels -> database ``` **Search** ``` query -> word2vec -> embeddings -> vector_db -> labels -> database -> images ``` ## CLIP **Initialization** Nothing **Indexing** ``` image -> CLIP -> embeddings -> vector_db ``` **Search** ``` query -> CLIP -> embeddings -> vector_db -> images ``` I feel `word2vec` would be faster... But I don't know if it would make sense to have both flows at the same time.
Author
Owner

@lastzero commented on GitHub (May 21, 2025):

Note that starting with v11.7, MariaDB includes native vector functions. We also have a Qdrant installation script, so it may be possible to perform searches with it in future releases. In addition to vectors, Qdrant also supports filtering by text, dates, and coordinates. So, it could potentially replace MariaDB entirely for searches or serve as an alternative for certain use cases.

@lastzero commented on GitHub (May 21, 2025): Note that starting with v11.7, MariaDB includes [native vector functions](https://mariadb.org/projects/mariadb-vector/). We also have a [Qdrant installation script](https://github.com/photoprism/photoprism/blob/develop/scripts/dist/install-qdrant.sh), so it may be possible to perform searches with it in future releases. In addition to vectors, Qdrant also supports filtering by text, dates, and coordinates. So, it could potentially replace MariaDB entirely for searches or serve as an alternative for certain use cases.
Author
Owner

@lastzero commented on GitHub (Aug 4, 2025):

I've started a new preview build to test these changes! It will soon be available on Docker Hub:

Note that it would be good to have some specific configuration and usage examples ready for end users before releasing these improvements in the stable version. @graciousgrey and @omerdduran are happy to help with this.

@lastzero commented on GitHub (Aug 4, 2025): I've started a new preview build to test these changes! It will soon be available on Docker Hub: - https://docs.photoprism.app/getting-started/updates.html#development-preview Note that it would be good to have some specific configuration and usage examples ready for end users before releasing these improvements in the stable version. @graciousgrey and @omerdduran are happy to help with this.
Author
Owner

@lastzero commented on GitHub (Aug 4, 2025):

@raystlin Initial testing of the new preview build on our public demo shows that the number of detected pictures and labels remains the same with these changes, which is generally a good sign! Of course, improvements would be great as well, though this would probably require the use of other models...

Original Labels

Image

Labels After Merging the PR

Image

@lastzero commented on GitHub (Aug 4, 2025): @raystlin Initial testing of the new preview build on our [public demo](https://demo.photoprism.app/) shows that the number of detected pictures and labels remains the same with these changes, which is generally a good sign! Of course, improvements would be great as well, though this would probably require the use of other models... ### Original Labels ![Image](https://github.com/user-attachments/assets/84c22b3b-82b5-4c7e-95b2-b48f09841643) ### Labels After Merging the PR ![Image](https://github.com/user-attachments/assets/508c7455-1060-4443-955e-623d02de3e60)
Author
Owner

@lastzero commented on GitHub (Aug 4, 2025):

@raystlin @omerdduran @graciousgrey If you have a custom config/vision.yml file, please note that it must be updated to avoid TensorFlow/Vision errors when generating labels (due to missing metadata, that is now required). That is, unless additional changes are made to handle missing configuration data - at least for the built-in default models?

A new, valid vision.yml file can be created with the following command:

photoprism vision save
@lastzero commented on GitHub (Aug 4, 2025): @raystlin @omerdduran @graciousgrey If you have a custom `config/vision.yml` file, please note that it must be updated to avoid TensorFlow/Vision errors when generating labels (due to missing metadata, that is now required). That is, unless additional changes are made to handle missing configuration data - at least for the built-in default models? A new, valid `vision.yml` file can be created with the following command: ```bash photoprism vision save ```
Author
Owner

@lastzero commented on GitHub (Aug 4, 2025):

With the changes I will push shortly, this should work for using the default configurations in vision.yml files:

Models:
- Type: labels
  Default: true
- Type: nsfw
  Default: true
- Type: face
  Default: true

So all you need to do is set the "Default" flag (and remove it if you want to use a custom config)!

@lastzero commented on GitHub (Aug 4, 2025): With the changes I will push shortly, this should work for using the default configurations in `vision.yml` files: ```yaml Models: - Type: labels Default: true - Type: nsfw Default: true - Type: face Default: true ``` So all you need to do is set the "Default" flag (and remove it if you want to use a custom config)!
Author
Owner

@raystlin commented on GitHub (Aug 4, 2025):

I have tested this (with good results) using efficientnet-v2-tensorflow2-imagenet1k-b0-classification-v2.tar.gz and efficientnet-v2-tensorflow2-imagenet1k-m-classification-v2.tar.gz.
The vision.yaml files I used was:

---
#b0
Models:
- Type: labels
  Name: efficientnet_b0
  Version: mobile
  Resolution: 224
  Output:
      Logits: true
- Type: nsfw
  Default: true
- Type: face
  Default: true
- Type: caption
  Name: qwen2.5vl
  Version: latest
  Prompt: Create an interesting caption that sounds natural and briefly describes
    the visual content in up to 3 sentences. Avoid text formatting, meta-language,
    and filler words. Do not start captions with phrases such as "This image", "The
    image", "This picture", "The picture", "A picture of", "Here are", or "There is".
    Instead, start describing the content by identifying the subjects, location, and
    any actions that might be performed. Use explicit language to describe the scene
    if necessary for a proper understanding.
  Resolution: 720
  Service:
    FileScheme: data
    RequestFormat: vision
    ResponseFormat: vision
Thresholds:
  Confidence: 10
---
#m 
Models:
- Type: labels
  Name: efficientnet_m
  Version: mobile
  Resolution: 480
  Output:
      Logits: true
- Type: nsfw
  Default: true
- Type: face
  Default: true
- Type: caption
  Name: qwen2.5vl
  Version: latest
  Prompt: Create an interesting caption that sounds natural and briefly describes
    the visual content in up to 3 sentences. Avoid text formatting, meta-language,
    and filler words. Do not start captions with phrases such as "This image", "The
    image", "This picture", "The picture", "A picture of", "Here are", or "There is".
    Instead, start describing the content by identifying the subjects, location, and
    any actions that might be performed. Use explicit language to describe the scene
    if necessary for a proper understanding.
  Resolution: 720
  Service:
    FileScheme: data
    RequestFormat: vision
    ResponseFormat: vision
Thresholds:
  Confidence: 10

Please, take into account that missing fields such as the input/output layer names or the model tags are read from the saved_model file, while others, have default values:

  • Default ColorChannelOrder: RGB
  • Default ResizeOperation: CenterCrop
  • Default Output Logits: False (By default we expect the last layer to be a Softmax).
  • Default Input Intervals: [0, 1] (If just one interval is specified, it will be used on the three channels).
@raystlin commented on GitHub (Aug 4, 2025): I have tested this (with good results) using [efficientnet-v2-tensorflow2-imagenet1k-b0-classification-v2.tar.gz](https://dl.photoprism.app/tensorflow/vision/efficientnet-v2-tensorflow2-imagenet1k-b0-classification-v2.tar.gz) and [efficientnet-v2-tensorflow2-imagenet1k-m-classification-v2.tar.gz](https://dl.photoprism.app/tensorflow/vision/efficientnet-v2-tensorflow2-imagenet1k-m-classification-v2.tar.gz). The vision.yaml files I used was: ```yaml --- #b0 Models: - Type: labels Name: efficientnet_b0 Version: mobile Resolution: 224 Output: Logits: true - Type: nsfw Default: true - Type: face Default: true - Type: caption Name: qwen2.5vl Version: latest Prompt: Create an interesting caption that sounds natural and briefly describes the visual content in up to 3 sentences. Avoid text formatting, meta-language, and filler words. Do not start captions with phrases such as "This image", "The image", "This picture", "The picture", "A picture of", "Here are", or "There is". Instead, start describing the content by identifying the subjects, location, and any actions that might be performed. Use explicit language to describe the scene if necessary for a proper understanding. Resolution: 720 Service: FileScheme: data RequestFormat: vision ResponseFormat: vision Thresholds: Confidence: 10 ``` ```yaml --- #m Models: - Type: labels Name: efficientnet_m Version: mobile Resolution: 480 Output: Logits: true - Type: nsfw Default: true - Type: face Default: true - Type: caption Name: qwen2.5vl Version: latest Prompt: Create an interesting caption that sounds natural and briefly describes the visual content in up to 3 sentences. Avoid text formatting, meta-language, and filler words. Do not start captions with phrases such as "This image", "The image", "This picture", "The picture", "A picture of", "Here are", or "There is". Instead, start describing the content by identifying the subjects, location, and any actions that might be performed. Use explicit language to describe the scene if necessary for a proper understanding. Resolution: 720 Service: FileScheme: data RequestFormat: vision ResponseFormat: vision Thresholds: Confidence: 10 ``` Please, take into account that missing fields such as the input/output layer names or the model tags are read from the `saved_model` file, while others, have default values: * Default ColorChannelOrder: RGB * Default ResizeOperation: CenterCrop * Default Output Logits: False (By default we expect the last layer to be a Softmax). * Default Input Intervals: [0, 1] (If just one interval is specified, it will be used on the three channels).
Author
Owner

@lastzero commented on GitHub (Aug 5, 2025):

@raystlin Thanks a lot! This should help @graciousgrey get started with testing and documenting the improvements. 📚

It's possible that we'll have some follow-up questions for you, though. For example, did you implement or test your changes with other model types, like "nsfw" or "face"? Or, are these not yet supported or tested, so we should focus on labels only?

Based on our previous conversations, I assume that using TensorFlow with the "caption" model type to generate image descriptions isn't possible, as it would require a lot of additional work. That's why I recently implemented an Ollama API integration, which should suffice for now.

However, one major pain point for users is that the face detection provided by the https://github.com/esimov/pigo package isn't reliable and often misses faces. Would it make sense to use one of these Object Detection models to locate faces (or other objects) in images instead? If so, how difficult would it be to integrate it with our current code base? 🤔

@lastzero commented on GitHub (Aug 5, 2025): @raystlin Thanks a lot! This should help @graciousgrey get started with testing and documenting the improvements. 📚 It's possible that we'll have some follow-up questions for you, though. For example, did you implement or test your changes with other model types, like "nsfw" or "face"? Or, are these not yet supported or tested, so we should focus on labels only? Based on our previous conversations, I assume that using TensorFlow with the "caption" model type to generate image descriptions isn't possible, as it would require a lot of additional work. That's why I recently implemented an [Ollama API integration](https://github.com/photoprism/photoprism/issues/5123), which should suffice for now. However, one major pain point for users is that the [face detection](https://github.com/photoprism/photoprism/issues/4669) provided by the https://github.com/esimov/pigo package isn't reliable and often misses faces. Would it make sense to use one of these *Object Detection* models to locate faces (or other objects) in images instead? If so, how difficult would it be to integrate it with our current code base? 🤔 - https://www.kaggle.com/models?task=17074
Author
Owner

@raystlin commented on GitHub (Aug 5, 2025):

It has only been fully done for labels, however, some parts like extracting model information from saved_model has been applied to nsfw and faces. Fully porting it to nsfw should be quite straightforward, as it is really just another classification model.
If I don't remember bad, nsfw applies some kind of preprocessing to the image using an on-the-fly tensorflow layer composition that should be looked into carefully, but, apart from that, the rest should be more or less the same.
faces is a complete different thing. Changing the model would imply throwing away the stored vectors and recalculating everything again. Also, I guess that if the new model produces different size vectors some parts of the code would need to be changed...
For captions I did not do anything, but I could look into it if you give me a model that you think it's performant. I would also say that the model should not be too big...
Finally, migrating from pigo to a face detection model is something I could do. Maybe we could have a new issue for that. I know some models that could do the job, but maybe you have some suggestions.

@raystlin commented on GitHub (Aug 5, 2025): It has only been fully done for `labels`, however, some parts like extracting model information from `saved_model` has been applied to `nsfw` and `faces`. Fully porting it to `nsfw` should be quite straightforward, as it is really just another classification model. If I don't remember bad, `nsfw` applies some kind of preprocessing to the image using an on-the-fly tensorflow layer composition that should be looked into carefully, but, apart from that, the rest should be more or less the same. `faces` is a complete different thing. Changing the model would imply throwing away the stored vectors and recalculating everything again. Also, I guess that if the new model produces different size vectors some parts of the code would need to be changed... For `captions` I did not do anything, but I could look into it if you give me a model that you think it's performant. I would also say that the model should not be too big... Finally, migrating from pigo to a face detection model is something I could do. Maybe we could have a new issue for that. I know some models that could do the job, but maybe you have some suggestions.
Author
Owner

@lastzero commented on GitHub (Aug 5, 2025):

It has only been fully done for labels, however, some parts like extracting model information from saved_model has been applied to nsfw and faces. Fully porting it to nsfw should be quite straightforward, as it is really just another classification model.

Yes, it's essentially the same as label detection, i.e. image classification, just with a smaller set of labels and some post-processing to determine whether an image should be flagged as NSFW or rejected when being uploaded. Our current NSFW model is also not very good, so if there's a better model that's easier to integrate (e.g. one that's more similar to the models that are already supported), then we should use that instead. Even a model with a single percentage output value would do.

faces is a complete different thing. Changing the model would imply throwing away the stored vectors and recalculating everything again.

True, it would only be worth looking into that later, once we have a migration strategy and another model that creates better embeddings. For now, face detection and vector comparison/clustering are the larger issues.

For captions I did not do anything, but I could look into it if you give me a model that you think it's performant.

Sure, but it's not urgent since we have the Ollama integration, which works reasonably well. It might be difficult to achieve better results or performance!?

Finally, migrating from pigo to a face detection model is something I could do. Maybe we could have a new issue for that. I know some models that could do the job, but maybe you have some suggestions.

Awesome! That would be a great addition if you have time to work on it. Face detection is a huge pain point for our users right now, and I think this could significantly improve things while also making it possible to detect other objects of interest. The database tables are already prepared for that, only the data is missing. 🧐

@lastzero commented on GitHub (Aug 5, 2025): > It has only been fully done for `labels`, however, some parts like extracting model information from `saved_model` has been applied to `nsfw` and `faces`. Fully porting it to `nsfw` should be quite straightforward, as it is really just another classification model. Yes, it's essentially the same as label detection, i.e. image classification, just with a smaller set of labels and some post-processing to determine whether an image should be flagged as NSFW or rejected when being uploaded. Our current NSFW model is also not very good, so if there's a better model that's easier to integrate (e.g. one that's more similar to the models that are already supported), then we should use that instead. Even a model with a single percentage output value would do. > `faces` is a complete different thing. Changing the model would imply throwing away the stored vectors and recalculating everything again. True, it would only be worth looking into that later, once we have a migration strategy and another model that creates better embeddings. For now, face detection and vector comparison/clustering are the larger issues. > For `captions` I did not do anything, but I could look into it if you give me a model that you think it's performant. Sure, but it's not urgent since we have the Ollama integration, which works reasonably well. It might be difficult to achieve better results or performance!? > Finally, migrating from pigo to a face detection model is something I could do. Maybe we could have a new issue for that. I know some models that could do the job, but maybe you have some suggestions. Awesome! That would be a great addition if you have time to work on it. Face detection is a huge pain point for our users right now, and I think this could significantly improve things while also making it possible to detect other objects of interest. The database tables are already prepared for that, only the data is missing. 🧐
Author
Owner

@lastzero commented on GitHub (Aug 8, 2025):

@raystlin @graciousgrey These changes will load the models from assets/models instead of directly from the assets folder. This should make it easier for us to find, use and add models.

You can prefix the directory names of models that you don't want to include in a release with an underscore, e.g. assets/models/_resnet.

Please let me know if this works for you or if you experience any problems, like broken tests. Thank you!

@lastzero commented on GitHub (Aug 8, 2025): @raystlin @graciousgrey These changes will load the models from `assets/models` instead of directly from the `assets` folder. This should make it easier for us to find, use and add models. You can prefix the directory names of models that you don't want to include in a release with an underscore, e.g. `assets/models/_resnet`. Please let me know if this works for you or if you experience any problems, like broken tests. Thank you!
Author
Owner

@graciousgrey commented on GitHub (Sep 22, 2025):

Related documentation can be found here: https://docs.photoprism.app/developer-guide/vision/custom-classification-model/

@graciousgrey commented on GitHub (Sep 22, 2025): Related documentation can be found here: https://docs.photoprism.app/developer-guide/vision/custom-classification-model/
Author
Owner

@lastzero commented on GitHub (Oct 1, 2025):

Note that with the latest changes in our preview build, you should be able to run custom TensorFlow models in the background after indexing or on a custom schedule:

I have only tested this with Ollama models so far, but there is no reason it shouldn't work with TensorFlow models as well.

@lastzero commented on GitHub (Oct 1, 2025): Note that with the latest changes in our preview build, you should be able to run custom TensorFlow models in the background after indexing or on a custom schedule: - https://github.com/photoprism/photoprism/issues/5234 I have only tested this with Ollama models so far, but there is no reason it shouldn't work with TensorFlow models as well.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/photoprism#103
No description provided.