mirror of
https://github.com/photoprism/photoprism.git
synced 2026-03-02 22:57:18 -05:00
AI: Add a CLIP-powered semantic search #941
Labels
No labels
ai
android
api
auth
awesome
bug
bug
ci
cli
config
database
declined
deprecated
docker
docs 📚
documents
duplicate
easy
enhancement
enhancement
enhancement
epic
faces
feedback wanted
frontend
hacktoberfest
help wanted
idea
in-progress
incomplete
index
invalid
ios
labels
live
live
low-priority
macos
member-feature
metadata
mobile
nas
needs-analysis
no-coding-required
no-coding-required
observability
performance
places
please-test
plus-feature
priority
pro-feature
question
raspberry-pi
raw
released
released
released
research
resolved
security
sharing
tested
tests
third-party-issue
thumbnails
upgrade
upstream-issue
ux
vector
video
waiting
won't fix
won't fix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/photoprism#941
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @flozi00 on GitHub (May 12, 2021).
As a user, I want to perform a CLIP-powered semantic search using a local model so that I can find pictures via natural language queries without relying on external cloud services.
Implementation Steps:
ModelTypeClipand default config ininternal/ai/vision/models.go; expose ONNX loader using existing runtime glue.ModelsPath/clip/<name>.internal/workers/vision.goandinternal/workers/meta.gorespectingRunschedule. Batched embedding generation; matryoshka down-projection to 64-d optional via config.internal/entity/clip_embedding.go; migrations per DB driver. On SQLite Phase 1, persistence only (search disabled).search.Queryto acceptClipQueryandClipWeight(float).ORDER BY distanceusing driver-specific SQL; then apply existing ordering as secondary.Model Comparison
Code Map
internal/api/photos_search.goandinternal/api/photos_search_geo.go(extend request struct & pipeline).internal/ai/vision(model load/run),internal/entity(embedding store).internal/config/options.go(flags/env),vision.yml.internal/commands/*.go.internal/workers/vision.go,internal/workers/meta.go.Acceptance Criteria
Documentation & References
@lastzero commented on GitHub (May 14, 2021):
Looks cool, but we're drowning in work right now. Maybe later this year?
@flozi00 commented on GitHub (May 14, 2021):
If you are okay with I could provide an elasticsearch based API with simple Crud you could use for search. Then it would be only a small change on your side and another service in docker compose
@lastzero commented on GitHub (May 14, 2021):
You're most welcome to contribute! Be aware that even merging a pull request is going to take some time right now :/
@flozi00 commented on GitHub (May 22, 2021):
Do you have an channel where I can't contact you ?
I'd really like to provide some Apis with useful features, maybe we could discuss about
@lastzero commented on GitHub (May 23, 2021):
You can find us on matrix if you follow the link for our community chat.
@tknobi commented on GitHub (Feb 3, 2022):
Hello all,
since this is my first interaction with PhotoPrism, I would like to take this opportunity to thank you @lastzero (and all the other contributors :) ) for this wonderful piece of software. I have tested many different photo galleries, PhotoPrism runs by far the most stable, fastest and offers (almost) everything I need.
Now about this issue: After getting in touch with CLIP and once again desperately trying to find an image by its content in PhotoPrism, I had the same idea as @flozi00 (Thanks for that!) and I want to thank by contributing a CLIP based content search.
As already mentioned, the CLIP model can only be addressed in python. So I propose to provide a CLIP api that can encode an image or text into clip embeddings via a REST interface. I will keep the code on Python side minimal. This is to keep it maintainable for go developers and no own test infrastructure is needed then.
To store the embeddings SQL is not suitable, because there is (to my knowledge) no efficient way to perform a nearest neighbor search. However, based on my research (including this great
post) and experience, I wouldn't use an Elasticsearch instance, as I think it adds too much overhead. Instead, I would recommend qdrant, which only requires a single container that is only ~42 MB in size.
@xyb commented on GitHub (Nov 16, 2025):
UForm is also a good choice, the model size is small but the speed is very fast.
@xyb commented on GitHub (Nov 28, 2025):
The official documentation for UForm and USearch is somewhat outdated, so I created a demo to show how to use UForm in practice: https://github.com/xyb/uform-image-search
@lastzero commented on GitHub (Nov 28, 2025):
@xyb Thanks a lot! I updated the issue description above to include guidance on the implementation steps and acceptance criteria. It also includes the links you shared.