mirror of
https://github.com/photoprism/photoprism.git
synced 2026-03-02 22:57:18 -05:00
Labels: Load and use a custom TensorFlow model #103
Labels
No labels
ai
android
api
auth
awesome
bug
bug
ci
cli
config
database
declined
deprecated
docker
docs 📚
documents
duplicate
easy
enhancement
enhancement
enhancement
epic
faces
feedback wanted
frontend
hacktoberfest
help wanted
idea
in-progress
incomplete
index
invalid
ios
labels
live
live
low-priority
macos
member-feature
metadata
mobile
nas
needs-analysis
no-coding-required
no-coding-required
observability
performance
places
please-test
plus-feature
priority
pro-feature
question
raspberry-pi
raw
released
released
released
research
resolved
security
sharing
tested
tests
third-party-issue
thumbnails
upgrade
upstream-issue
ux
vector
video
waiting
won't fix
won't fix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/photoprism#103
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Kazakuri on GitHub (Jul 23, 2019).
Originally assigned to: @raystlin, @graciousgrey, @omerdduran on GitHub.
It'd be nice to have some sort of documented procedure for loading a different TensorFlow model into PhotoPrism.
I don't have much experience with machine learning so I may be missing the mark with some of my assumptions but I tried to look into it a bit and wrote down my findings here.
From my initial investigation it looks like NASNet is downloaded when building the Docker image, and then the model is loaded from a static location in the application config.
By default that's
~/.local/share/photoprism/resources/nasnetso to replace the TensorFlow model you can just mount that folder with a new asset or otherwise change theassets-pathin the config file.From there I start to get a bit lost, when loading the model it seems like it loads a graph that's tagged with the name
photoprismand labels fromlabels.txtin the same folder. It might be helpful to make the name of the tag that's loaded part of the configuration to make changing out models easier.The documentation I found for
LoadSavedModeldoesn't tell me much, but my assumption is that it's only capable of loading protocol buffer models, so models must be trained and saved as .pb files or somehow converted to a .pb file.The other thing I've noticed when looking through the code is that PhotoPrism uses models trained at 224x224, so I assume that if a custom model was trained at a different size then those numbers would need to be changed as well.
As a summary of what I've gathered it seems like these are the steps required:
~/.local/share/photoprism/resources/nasnet.saved_model.pbandlabels.txt.IssueHunt Summary
Backers (Total: $60.00)
Become a backer now!
Or submit a pull request to get the deposits!
Tips
@lastzero commented on GitHub (Jul 24, 2019):
You are right, it's not that easy to change the model. We might build support for multiple models later. One of the reasons is, that models are trained using different resolutions, so you can't just use a different model as you need thumbs that have the right size and you also need to know how to interpret the labels (linguistics). On top, you need the model in the right format for TensorFlow for C. I had to write some Python for that. Not easy to automate.
@JimHinson commented on GitHub (Jul 24, 2019):
Would it work to convert all images and thumbnails from the first set, then use the model on the second set?
Thanks!
From: Michael Mayer notifications@github.com
Sent: Wednesday, July 24, 2019 1:12 AM
To: photoprism/photoprism photoprism@noreply.github.com
Cc: Subscribed subscribed@noreply.github.com
Subject: Re: [photoprism/photoprism] Loading a Custom TensorFlow Model (#127)
You are right, it's not that easy to change the model. We might build support for multiple models later. One of the reasons is, that models are trained using different resolutions, so you can't just use a different model as you need thumbs that have the right size and you also need to know how to interpret the labels (linguistics). On top, you need the model in the right format for TensorFlow for C. I had to write some Python for that. Not easy to automate.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fphotoprism%2Fphotoprism%2Fissues%2F127%3Femail_source%3Dnotifications%26email_token%3DAC4XNEOZUAUE2H62YFGS33TQA7QAHA5CNFSM4IGLGT5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2VF7CY%23issuecomment-514482059&data=02%7C01%7C%7C1a5b8790e3e04fb2bac908d70ff5647e%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636995418928099088&sdata=wInfphUYgRKbEtPhYBxe6%2BjqNOX3V%2FfxmCRfWkmHiZM%3D&reserved=0, or mute the threadhttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAC4XNEJQBYA7UN3ZSW4ESRLQA7QAHANCNFSM4IGLGT5A&data=02%7C01%7C%7C1a5b8790e3e04fb2bac908d70ff5647e%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636995418928109093&sdata=S3FYasiE2%2FQ3vc0KAGEewOmqOeiT1Prlfm%2BI64nOwB4%3D&reserved=0.
@JimHinson commented on GitHub (Jul 24, 2019):
Sorry, let me clarify:
convert all images and thumbnails to the same resolution on both sets, then use the model from the first on the second set?
From: Jim Hinson
Sent: Wednesday, July 24, 2019 8:37 AM
To: photoprism/photoprism reply@reply.github.com; photoprism/photoprism photoprism@noreply.github.com
Cc: Subscribed subscribed@noreply.github.com
Subject: RE: [photoprism/photoprism] Loading a Custom TensorFlow Model (#127)
Would it work to convert all images and thumbnails from the first set, then use the model on the second set?
Thanks!
From: Michael Mayer <notifications@github.commailto:notifications@github.com>
Sent: Wednesday, July 24, 2019 1:12 AM
To: photoprism/photoprism <photoprism@noreply.github.commailto:photoprism@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.commailto:subscribed@noreply.github.com>
Subject: Re: [photoprism/photoprism] Loading a Custom TensorFlow Model (#127)
You are right, it's not that easy to change the model. We might build support for multiple models later. One of the reasons is, that models are trained using different resolutions, so you can't just use a different model as you need thumbs that have the right size and you also need to know how to interpret the labels (linguistics). On top, you need the model in the right format for TensorFlow for C. I had to write some Python for that. Not easy to automate.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fphotoprism%2Fphotoprism%2Fissues%2F127%3Femail_source%3Dnotifications%26email_token%3DAC4XNEOZUAUE2H62YFGS33TQA7QAHA5CNFSM4IGLGT5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2VF7CY%23issuecomment-514482059&data=02%7C01%7C%7C1a5b8790e3e04fb2bac908d70ff5647e%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636995418928099088&sdata=wInfphUYgRKbEtPhYBxe6%2BjqNOX3V%2FfxmCRfWkmHiZM%3D&reserved=0, or mute the threadhttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAC4XNEJQBYA7UN3ZSW4ESRLQA7QAHANCNFSM4IGLGT5A&data=02%7C01%7C%7C1a5b8790e3e04fb2bac908d70ff5647e%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636995418928109093&sdata=S3FYasiE2%2FQ3vc0KAGEewOmqOeiT1Prlfm%2BI64nOwB4%3D&reserved=0.
@Kazakuri commented on GitHub (Jul 24, 2019):
It looks to me like the
makeTensorfunction is capable of resizing the thumbnails again after they're created.The
classifyImagefunction in the indexer will always pass in thetile_224,left_224, orright_224thumbnails, but the call toimaging.Fillscales and crops the image to the dimensions you give it.There might be issues with having that scale to a resolution bigger than the thumbnails, but if that function knew the resolution the model was trained at then couldn't it just handle the resizing there when classifying the image?
@lastzero commented on GitHub (Jul 24, 2019):
We can also create thumbs on-demand, it's just slower and you need to configure out. Right now, the code is "hard wired" to only work with our standard model, but it should be relatively easy to add config options later.
@Kazakuri commented on GitHub (Jul 26, 2019):
Small info dump update since I had a bit more time to play around with this today. I was able to convert some models fairly successfully using MMdnn.
An example of one of the models:
From that output, I changed the tag-set to
traininstead ofphotoprism, changed the graph operations toInput3andStableSigmoid786596instead ofinput_1andpredictions/Softmax, and changed theimaging.Fillwidth and height to 299 instead of 224.After that I updated the
labels.txtand gave it a shot but unfortunately, since the output type of the model is[][][]float32instead of just[][]float32, thebestLabelsfunction doesn't know how to process it. The inner array only ever seems to be a single element, but I don't know if there's an easy way to get around that limitation in the model.For what it's worth though, after some minor modifications to
bestLabelsto accept a[][][]float32and dump the arrays, it did look like I was getting back some pretty good results.@lastzero commented on GitHub (Dec 30, 2019):
@Kazakuri Shall we keep this issue open? What exactly should be the outcome / acceptance criteria?
@Kazakuri commented on GitHub (Dec 30, 2019):
I guess it sort of depends on what you and the other PhotoPrism devs want to do about custom / multiple models, but things like the changing model graph name and the thumbnail resolution could probably be moved to into config options so that you're not forced to rebuild the code to swap out the model.
Like you said in your initial comment, there are still a handful of complications around getting a pretrained model into the right format that PhotoPrism is expecting, so if it's possible then some documentation in the wiki as a starting point for developers to go off of would be great. It could even end up being a whole other repository that's dedicated to doing model conversions into PhotoPrism's format but that's extremely complex and well outside the scope of PhotoPrism.
@lastzero commented on GitHub (Dec 31, 2019):
We need to be careful not to solve problems that not (yet) exist. When adding a new model recently (NSFW detector), I figured the code needed is so different that abstraction would only complicate things. Results even depend on how you resize images and create tensors from them. See https://github.com/photoprism/photoprism/issues/175.
You are right that our docs could be improved and we also want to provide workshops for people who want to learn Go and/or TensorFlow. Maybe we can do this next year when our community becomes more active. Even in Berlin, days have only 24 hours and we - the core team - also need to sleep and eat ;)
Are you interested in contributing?
@Kazakuri commented on GitHub (Jan 4, 2020):
Ah, that makes sense, yeah. When I was making this issue I did think of NSFW detection, but I was initially looking more towards models for hair color and gender classification or a completely separate use case for classifying digital art using something like Danbooru2018.
I'm by no means a machine learning / TensorFlow expert, but it is something I'm fairly interested in getting more experience with. Similarly, I know enough about Go to hack around a project that uses it, but I wouldn't say I can program well using it.
I haven't had much of a chance to tinker with this project since September but it is still something I'm looking at getting back to playing with, so I would be happy to contribute whatever I can.
@issuehunt-oss[bot] commented on GitHub (Apr 26, 2020):
An anonymous user has funded $60.00 to this issue.
@mcombalia commented on GitHub (Jul 28, 2020):
It would be cool to have a generic pipeline for data labelling (e.g. see labelstudio backend) and use that generically. New models should have to fit the backend (e.g. generic image is given to the model, the model has to take care of transforming it to its needed resolution and so on...)
Also, that would make the backend not library dependant.
@jared252016 commented on GitHub (Jul 26, 2021):
I am actually looking forward to PhotoPrism being used for me to train a data set, taking advantage of the API and labels. I too am interested in being able to add custom models similar to gender classification and hair/race.
Alternatively I'm looking at crawling WebDAV and running it through the API. A simple way to achieve this for the time being may be to just allow a on-import command to be run through CLI, passing in some useful information such as the URL of the image. Processing could be done remotely as well, so I could use my GPU instead of just a CPU that the VM has.
Edit: See my issue https://github.com/photoprism/photoprism/issues/1450. I think it should be a fairly simple way to achieve what you are wanting to do. API call to add the label (NYI in the client) after processing.
@mxwi commented on GitHub (Jul 27, 2023):
hey there,
just wanted to ask if there is any new information on this. I like photoprism very much - but the object clasification is not really working for me. I got lots of false-positives and not recognised pictures.
We have many many pictures of cats ;) - there are clearly visible and to be identified as cats, but not recognised by the model.
@graciousgrey commented on GitHub (Jul 27, 2023):
Hey 👋
Thank you for your kind feedback!
Unfortunately, we haven't had time to work on this yet. Our roadmap shows what tasks are currently in progress and what will be implemented next.
We kindly ask everyone to use Github Discussions or our Community Chat instead of Github Issues for general questions. Thank you very much ❤️
@yggdrasil75 commented on GitHub (Sep 28, 2023):
https://www.reddit.com/r/photoprism/comments/10deccq/just_installed_photoprism_cant_stop_laughing_at/
at the very least, can the nasnet model used be made to be a higher resolution option? cause currently its tagging like trash. just tried this and cant get any decent tags, and I am not manually tagging my thousands of images.
allow the user to set nasnet-c 1@226 (whatever you use by default) or nasnet-a 6@4032. I would love to use clip instead, particularly vit-l/14 because I have a beast of a home server, but understand that the smallest model possible is used for weaker computers. I get not putting a lot of effort in for using different types of models, like resnet, vit, and so on, but nasnet has better models than the one included.
@lastzero commented on GitHub (Sep 28, 2023):
Whatever your opinion is, it was good enough for many others to build on our work. A larger model (with higher resolution) is possible, but significantly larger and slower. Our next releases will focus on OIDC, 2FA, and Batch Edit. When that is done, we might be able to take a look at this. It's been a while since we developed our image classification, so most likely there are better models available now.
@aa333 commented on GitHub (Jul 4, 2024):
ML instruments and frameworks are growing explosively. I am not expecting a small core team to stay on top of it, you'd need a separate crew of DS folks just to keep up. It doesn't seem to be feasible with your business model.
This ticket seems to describe the best course of action - make a good abstracted API, or at least ease the model swap and allow users to experiment in this area. I have some experience building media pipelines and unifying wildly different models (in client-side JS, with 30FPS and dynamic resizing), and I know how tricky can it be. So let the ML community jump in and help. That would make Photoprism massively more powerful for all of us.
@lastzero commented on GitHub (Jul 4, 2024):
@aa333 That is our plan. At the moment, however, we are still working on OpenID Connect (OIDC). Once it has been released, we will add support for additional/external models as one of the next steps, e.g.:
Any help with that would be much appreciated!
@lastzero commented on GitHub (Apr 6, 2025):
The above changes allow to configure the computer vision models through an (optional)
vision.ymlconfiguration file. However, since our current label model seems to be quite special when it comes to its input and output parameters, I wasn't able to simply replace it with another model, i.e. it will take some work to figure things out. Any help with this would be much appreciated!@lastzero commented on GitHub (Apr 6, 2025):
Since the upgrade from TensorFlow v1.15.2 to v2.18.0 is pretty much complete, it should generally be possible to download saved TensorFlow models and then put them in a subdirectory of the assets path (just like the other models).
The basic idea is that you can then use them instead of the default model by changing the configuration in
config/vision.yml:Unfortunately, the input/output parameters of the existing model are not standard as it seems, for example it uses
input_1instead of justinputfor the input:github.com/photoprism/photoprism@35e9294d87/internal/ai/classify/model.go (L94-L99)In addition, other image classification models may require a different thumbnail resolution, such as 384x384 or 480x480 pixels instead of 224x224 pixels. Simply changing the resolution value in the
vision.ymlfile will not automatically pass higher resolution images to the model, since the thumbnails are generated before the inference is run. So for this to work, the indexer should ask thevisionpackage for the right size before passing the images! 🗯@lastzero commented on GitHub (Apr 7, 2025):
Here is related documentation that explains how to inspect saved models and what common input/output signatures are:
Based on this, it unfortunately seems that image data needs to be transformed in different ways for use with TensorFlow, as some models expect pixel values to be floating point numbers in the range [-1, 1] and others [0, 1]).
Also note that when working with image classification models, you usually need a list of the labels they were trained with, so you know what their output values mean (depending on how many output values they have, e.g. 1k or 21k):
Since these terms seem to be pretty standardized, they are often not included in the model downloads, i.e. you have to get them separately and add them as
labels.txtto the model directory, e.g. in/assets/vision. It is not possible to use the models otherwise.@raystlin commented on GitHub (Apr 7, 2025):
Do you mind if I have a go at this or do you plan to tackle it yourself?
@lastzero commented on GitHub (Apr 7, 2025):
Please go ahead! I wish I had the time to keep working on this, but there's just too many other things waiting for attention. If you need anything, like additional settings/model parameters, just let me know :)
@lastzero commented on GitHub (Apr 7, 2025):
A few more notes on this, so that my findings and the current state are publicly documented:
(a) My suggestion would be to store additional models in a subdirectory of the assets path, e.g.
/assets/vision, and then reference them with e.g.vision/resnetin thevision.ymlmodel settings. When I experimented with additional models directly in assets (where the others are), it got way too crowded and I had to repeatedly update my.gitignorefile not to commit them.(b) I would also recommend to start with models that expect the same thumbnail resolution, i.e. 224x224, as this reduces test/development complexity and makes the results of new models more comparable to existing ones. We can later refactor the code to allow passing higher resolution thumbs to the models (or have TensorFlow generate batches from higher resolution images, since the thumbs are square, so if you have a 16:9 image, you want to use multiple thumbs that include different areas of the photo).
(c) Additional/improved functionality, e.g. for generating tensors from images, can be added to the
/internal/ai/tensorflowpackage. I created it yesterday to remove duplicate code in the individual model packages, but there's probably much more room for refactoring as it becomes clear how best to handle/abstract different models. The test coverage is also not good yet.@lastzero commented on GitHub (Apr 7, 2025):
Some potentially more accurate/powerful image classification models and the original imagenet 1k/21k labels have been uploaded to dl.photoprism.app, so they are easy to find and install from there (not all of them use 224x224 thumbnails, so you need to check this with the
saved_model_cli showcommand that is documented on tensorflow.org, or read the description on kaggle.com):Note that PhotoPrism currently uses a modified list of imagenet1k labels, as we didn't have a mapping configuration for the label names you see in the user interface initially, i.e. we edited some terms directly in
/assets/nasnet/labels.txt. So if you're using an imagenet1k model, it's better to use our existinglabels.txtin/assets/nasnetor the nasnet.zip archive with it, as it's been tested and works for our needs.@raystlin commented on GitHub (Apr 8, 2025):
My idea is to unmarshal the file
saved_model.pb(graft includes the protobuffer classes) and look for the model information there.I know I can have the tag names, the input and output names, expected input resolution and the expected amount of output tags, but I have to investigate if also the output range is available.
@raystlin commented on GitHub (Apr 11, 2025):
I have investigated the uploaded models and I think the following additional input parameters are needed to overwrite or set parameters when guessing is not possible or fails:
Inputs:
Outputs:
logits, that have to be transformed to probabilities using asoftmaxfunction.On my repository branch
feature/custom-tf-model-222I have added the structures I try to deduce from the models:A
Scaleparameter should be added toPhotoInput, but I will do it later because I have to test the model first to be sure.I could load and run all the models except
inception-v3-tensorflow2-tf2-preview-classification-v4.tar.gz, where I could not deduce the layers and shapes.@lastzero commented on GitHub (Apr 13, 2025):
@raystlin Wow, that looks great! 👍
In the meantime, I've further refined the (remote service) model configuration in the new
vision.ymlfile, so we can release what we already have and start testing it with the externalphotoprism/photoprism-visionservice we've been working on. We also added CLI commands to check the model configuration (photoprism vision ls) and to run selected models on a subset of pictures by specifying a search filter and the model types, e.g.:I would suggest adding the additional input and output options you propose in a later release, so as not to delay the next one. Have you only added the configuration options in your branch, or are they already working, i.e. can other models be integrated with them? If not, would you like to continue working on this and get it to a usable/useful state so that we can release it?
Note that you can also reach us via email and on Element/Matrix if you have any questions or feedback, and we'd be happy to invite you to our Contributors Chat! 💬
@raystlin commented on GitHub (Apr 13, 2025):
I think the coding is complete, but I feel a lot of testing remains to be done because I did unit testing for models but did not perform large tests on the whole system (i.e.
photoprism vision run).My commits are lacking, at least, the following things:
nasnetindownload-nasnet.shto have them as the default place to look for them if the output size matches.photoprismis already doing it.That being said, I can wait for the release and merge the commits to come until then or do the PR now. Whatever you prefer.
@raystlin commented on GitHub (May 12, 2025):
It's been a while, but I have some news regarding this issue.
photoprism vision runagainst flikr 30k photoset and so far I think the results are fine. Changing the model can lead to a huge increase in the processing times of the images (transformerandefficientnet_mare much much slower than the others) but the classification tags provided by the new model are fine.photoprismand now I have thousands of meme images I would like to filter. NSFW filter is rather useless for me, but a meme filter would be perfect, however, output categories are hardcoded... What I would propose would be to change these output categories to generic Yes/No (True/False or whatever) vialabels.txtfile served along with the model. What do you think about this approach?@lastzero commented on GitHub (May 13, 2025):
That's great news! 🎉
We've also made progress with the dedicated vision service that runs on Python. It now supports additional models and has a more capable API. For example, it now accepts data URLs, and the model to use can be specified in the request data:
Please note that the
photoprism/visionimage on Docker Hub has not yet been updated, so you still need to build from source to test these changes.As you mentioned, the downside is that such advanced models are often much slower, i.e. they should run asynchronously or be used with the
photoprism vision runcommand as needed.I need to work on some completely unrelated tasks right now, but will be happy to implement asynchronous indexing when I get a chance. That way, users can start browsing their pictures before the labels, faces, and captions have all been detected or generated.
That seems possible, though general filters might represent a slightly different use case:
So, I guess a general filter may/must be less aggressive and offer different capabilities and configuration options? Should all filters be able to block web uploads?
That's absolutely true, and we would only switch to a different face embedding model if it offered advantages over the current model, which is unclear at this point:
Do you happen to know of any models or Go libraries that can more reliably detect faces, i.e. the step before generating an embedding? We currently use https://github.com/esimov/pigo, but we may need a more capable solution unless I haven't used or configured it properly...
@raystlin commented on GitHub (May 13, 2025):
The only library I have seen in go to perform face detection is
pigo. However, if we want to change and use an AI model, there is a project calledinsightfacethat is focused on face detection/recognition. I think the models they use (https://github.com/deepinsight/insightface/tree/master/model_zoo) areonnx, but we can probably find them inkaggleas tf 2.0.Regarding the other points, I understood that Faces and NSFW are going to remain, at least for now, as they are. So, the only point left for me to solve is the
labels_21krules. I can do the PR without it, or spend some time on it (it does not seem to be a quick task 😄)@lastzero commented on GitHub (May 14, 2025):
If the face model requires us to run Python code, then it's probably not suitable for direct integration with our Go backend, but could instead be added to the dedicated photoprism-vision service so that it can be (optionally) used via a REST API. Even though it's probably slower, some users might prefer it for better face detection and/or recognition.
As for NSFW detection, it would be great to replace the current model with a better one that isn't significantly slower to reduce the number of false positives and negatives. However, I realize that it may take some time to find and test one, so I wouldn't expect this to be a quick win. Since the dedicated photoprism vision service should now also include a (different) NSFW model, we could try that for comparison as a next step (I haven't had time to do that yet).
If 21k image classification isn't a quick win either (as it sounds), then how about merging your existing improvements first and doing those models later (depending on your time and motivation)?
@raystlin commented on GitHub (May 14, 2025):
Sure, by the end of the day I will do the PR. I keep the task of doing 21k label rules and, as a side task, a version of face detection using AI in go.
If you find a suitable NSFW model, I can do the merging as well.
@lastzero commented on GitHub (May 17, 2025):
@raystlin Thanks a lot! I'll review and merge your pull request as soon as possible 👍
Having thought about it again, I believe it would be easier to generate CLIP embeddings and implement a search with those than to generate 21,000 different labels suitable only for regular text-based searches?
Another option for improving regular text-based searches could be to extract text from images using OCR. While I am not aware of any OCR models available on Kaggle, it might be possible to use generic multimodal models instead?
Would your improved TensorFlow integration/configuration allow any of this without requiring major changes?
@raystlin commented on GitHub (May 17, 2025):
I have not worked with CLIP on go before. I think the change would not be huge, but I need to inspect and try the model to confirm it... @lastzero Could you upload a CLIP model you know that runs fast to do the testing?
I really like the idea of generating embeddings instead of labels, but I have one concern with CLIP: performance.
Following the path of generating embeddings, another idea could be to use
word2vecon the classification result, which would allow us to use almost the same layer of models we are using plus a little bit of additional processing on the labels and queries. The data flow of the application would be slightly different:classification + embeddings
Initialization
Indexing
Search
CLIP
Initialization
Nothing
Indexing
Search
I feel
word2vecwould be faster... But I don't know if it would make sense to have both flows at the same time.@lastzero commented on GitHub (May 21, 2025):
Note that starting with v11.7, MariaDB includes native vector functions. We also have a Qdrant installation script, so it may be possible to perform searches with it in future releases. In addition to vectors, Qdrant also supports filtering by text, dates, and coordinates. So, it could potentially replace MariaDB entirely for searches or serve as an alternative for certain use cases.
@lastzero commented on GitHub (Aug 4, 2025):
I've started a new preview build to test these changes! It will soon be available on Docker Hub:
Note that it would be good to have some specific configuration and usage examples ready for end users before releasing these improvements in the stable version. @graciousgrey and @omerdduran are happy to help with this.
@lastzero commented on GitHub (Aug 4, 2025):
@raystlin Initial testing of the new preview build on our public demo shows that the number of detected pictures and labels remains the same with these changes, which is generally a good sign! Of course, improvements would be great as well, though this would probably require the use of other models...
Original Labels
Labels After Merging the PR
@lastzero commented on GitHub (Aug 4, 2025):
@raystlin @omerdduran @graciousgrey If you have a custom
config/vision.ymlfile, please note that it must be updated to avoid TensorFlow/Vision errors when generating labels (due to missing metadata, that is now required). That is, unless additional changes are made to handle missing configuration data - at least for the built-in default models?A new, valid
vision.ymlfile can be created with the following command:@lastzero commented on GitHub (Aug 4, 2025):
With the changes I will push shortly, this should work for using the default configurations in
vision.ymlfiles:So all you need to do is set the "Default" flag (and remove it if you want to use a custom config)!
@raystlin commented on GitHub (Aug 4, 2025):
I have tested this (with good results) using efficientnet-v2-tensorflow2-imagenet1k-b0-classification-v2.tar.gz and efficientnet-v2-tensorflow2-imagenet1k-m-classification-v2.tar.gz.
The vision.yaml files I used was:
Please, take into account that missing fields such as the input/output layer names or the model tags are read from the
saved_modelfile, while others, have default values:@lastzero commented on GitHub (Aug 5, 2025):
@raystlin Thanks a lot! This should help @graciousgrey get started with testing and documenting the improvements. 📚
It's possible that we'll have some follow-up questions for you, though. For example, did you implement or test your changes with other model types, like "nsfw" or "face"? Or, are these not yet supported or tested, so we should focus on labels only?
Based on our previous conversations, I assume that using TensorFlow with the "caption" model type to generate image descriptions isn't possible, as it would require a lot of additional work. That's why I recently implemented an Ollama API integration, which should suffice for now.
However, one major pain point for users is that the face detection provided by the https://github.com/esimov/pigo package isn't reliable and often misses faces. Would it make sense to use one of these Object Detection models to locate faces (or other objects) in images instead? If so, how difficult would it be to integrate it with our current code base? 🤔
@raystlin commented on GitHub (Aug 5, 2025):
It has only been fully done for
labels, however, some parts like extracting model information fromsaved_modelhas been applied tonsfwandfaces. Fully porting it tonsfwshould be quite straightforward, as it is really just another classification model.If I don't remember bad,
nsfwapplies some kind of preprocessing to the image using an on-the-fly tensorflow layer composition that should be looked into carefully, but, apart from that, the rest should be more or less the same.facesis a complete different thing. Changing the model would imply throwing away the stored vectors and recalculating everything again. Also, I guess that if the new model produces different size vectors some parts of the code would need to be changed...For
captionsI did not do anything, but I could look into it if you give me a model that you think it's performant. I would also say that the model should not be too big...Finally, migrating from pigo to a face detection model is something I could do. Maybe we could have a new issue for that. I know some models that could do the job, but maybe you have some suggestions.
@lastzero commented on GitHub (Aug 5, 2025):
Yes, it's essentially the same as label detection, i.e. image classification, just with a smaller set of labels and some post-processing to determine whether an image should be flagged as NSFW or rejected when being uploaded. Our current NSFW model is also not very good, so if there's a better model that's easier to integrate (e.g. one that's more similar to the models that are already supported), then we should use that instead. Even a model with a single percentage output value would do.
True, it would only be worth looking into that later, once we have a migration strategy and another model that creates better embeddings. For now, face detection and vector comparison/clustering are the larger issues.
Sure, but it's not urgent since we have the Ollama integration, which works reasonably well. It might be difficult to achieve better results or performance!?
Awesome! That would be a great addition if you have time to work on it. Face detection is a huge pain point for our users right now, and I think this could significantly improve things while also making it possible to detect other objects of interest. The database tables are already prepared for that, only the data is missing. 🧐
@lastzero commented on GitHub (Aug 8, 2025):
@raystlin @graciousgrey These changes will load the models from
assets/modelsinstead of directly from theassetsfolder. This should make it easier for us to find, use and add models.You can prefix the directory names of models that you don't want to include in a release with an underscore, e.g.
assets/models/_resnet.Please let me know if this works for you or if you experience any problems, like broken tests. Thank you!
@graciousgrey commented on GitHub (Sep 22, 2025):
Related documentation can be found here: https://docs.photoprism.app/developer-guide/vision/custom-classification-model/
@lastzero commented on GitHub (Oct 1, 2025):
Note that with the latest changes in our preview build, you should be able to run custom TensorFlow models in the background after indexing or on a custom schedule:
I have only tested this with Ollama models so far, but there is no reason it shouldn't work with TensorFlow models as well.