mirror of
https://github.com/photoprism/photoprism.git
synced 2026-03-02 22:57:18 -05:00
AI: Integrate a model for Optical Character Recognition (OCR) #709
Labels
No labels
ai
android
api
auth
awesome
bug
bug
ci
cli
config
database
declined
deprecated
docker
docs 📚
documents
duplicate
easy
enhancement
enhancement
enhancement
epic
faces
feedback wanted
frontend
hacktoberfest
help wanted
idea
in-progress
incomplete
index
invalid
ios
labels
live
live
low-priority
macos
member-feature
metadata
mobile
nas
needs-analysis
no-coding-required
no-coding-required
observability
performance
places
please-test
plus-feature
priority
pro-feature
question
raspberry-pi
raw
released
released
released
research
resolved
security
sharing
tested
tests
third-party-issue
thumbnails
upgrade
upstream-issue
ux
vector
video
waiting
won't fix
won't fix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/photoprism#709
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Shamshala on GitHub (Jan 16, 2021).
Hi, when i came across "Scans" mark i was wondering what is the main purpose of that and after i found it out in the quick guide an idea poped in my mind whether wouldn't be possible to also include photos/scans of documents? And when image classification/object detection is already implemented, face recognition is considered (/on the way), what about OCR? 😈 It probably will be helpful with reading street signs and others.
Support for more built-in AI/ML models depends on upgrading the TensorFlow library we use from v1.15 to v2.x first.
In addition, we have started working on a microservice that will allow us to use advanced computer vision models through a REST API (also includes support for faster inference when indexing new pictures e.g. with Nvidia hardware):
It seems possible (and we have already talked about it internally) to have an API endpoint for OCR there as well. Any contributions to this would be much appreciated! 🤗
Related Issues:
@lastzero commented on GitHub (Jan 16, 2021):
Already tried, not that easy. Need to extract text areas first, can't just run OCR on a complete image.
@FadingArabChristians commented on GitHub (Dec 29, 2021):
Has this been added? It would be absolutely perfect for my use case atm.
@graciousgrey commented on GitHub (Dec 30, 2021):
No, this has not yet been added.
You find an overview of what is planned next on our roadmap: https://github.com/photoprism/photoprism/projects/5.
Ideas about libraries for text area extraction and OCR are very welcome :)
@mbethke commented on GitHub (Sep 26, 2022):
Having quite a few photos of random stuff with more or less amusing text on them in my collection, I'd love that feature! So I've been playing around with DeepDetect a little, and apart from eating hefty chunks of memory it performs quite well. I set up a container similar to their their quickstart instructions (just using docker-compose), added their word-detect model and tried it with a few images from the web, like
It spits out a JSON file with the bounding boxes, to visualize:
That will probably still need some work like coalescing overlapping boxes before the results can be fed to tesseract or something, but maybe it's a start?
@daniellandau commented on GitHub (Feb 18, 2023):
Didn't notice this issue before I opened #3206 above. As said there:
I've in the past used Tesseract with manually drawn boxes and EasyOCR seems to perform comparatively very well just given a full image so "can't just run OCR on a complete image" doesn't seem to be totally true anymore. EasyOCR gives both the bounding boxes and the texts in each of them, but for photoprism I'd be happy to just have the raw text searchable so I can find the image and then read the actual text myself from the image.
@ant0nwax commented on GitHub (Mar 2, 2023):
I currently search for two additional labels that are missing for my daily use:
Drawings and Paintings, i used them a lot on my former photo cloud and the AI there worked better. Also the OCR feature of text stored on Photos is a thing that could be a nice to have integration by the developers
@ALEEF02 commented on GitHub (Aug 26, 2023):
What's the latest on OCR? I'm willing to put development time into this personally if needed. Searching for text in images was one of the main uses I got out of Google Photos and I'd love to see that functionality here.
@Bobbyjohnsonz commented on GitHub (Sep 10, 2023):
Agreed, I believe Mylio does this out of the box but is paid for. The implementation seems to be able to be done locally no problem. This is a deciding factor for me. It’s such a powerful tool.
@ant0nwax commented on GitHub (Sep 10, 2023):
Do I understand right that there is a possibility to integrate OCR into photoprism, but it has nothing to do with an integration of Mylio, but maybe there is any other open source OCR already? and Mylio and Google Photos are not open source?
Could you name an open source OCR that deals with Photos already?
@ant0nwax commented on GitHub (Sep 10, 2023):
My research on AI
Question>
There are open source OCR tools to read text inside of Images and make it searchable... https://www.hitechnectar.com/blogs/open-source-ocr-tools/ on this link we see some OCR open source tools. Which of these would work best for integrating it into a on-premises photoprism solution?
Answer>
Based on the search results, Tesseract OCR engine seems to be the best open source OCR tool for integrating it into an on-premises photoprism solution. Tesseract OCR engine is sponsored by Google and is considered one of the most accurate, freely available open-source systems available
1
. However, other open source OCR tools like Ocrad, EasyOCR, and OpenCV can also be used for basic OCR tasks and can be trained with your own data
2
3
5
. It is worth noting that PhotoPrism is an AI-powered photos app for the decentralized web that makes use of the latest technologies to tag and find pictures automatically without getting in your way
6
.
@Fireflaker commented on GitHub (Jan 23, 2024):
Earlier I posted this on discussion:
May I ask for some recommendations on what is the best way to acquire OCR capability right now?
I am a student, but I would love to contribute what I can.
@graciousgrey commented on GitHub (Jan 25, 2024):
Thank you very much for offering your help! We are currently focusing on adding additional authentication options, so unfortunately we can't look into this in depth at this time. However, we will do our best to answer your questions. Please bear with us if it takes us a while to get back to you.
A good starting point would be to do some research and test the available libraries and create a decision matrix to find out which one would integrate best with PhotoPrism. Some points to consider are:
@Dobeemixer commented on GitHub (Dec 30, 2024):
Any updates on this? Searching for text in photos seems to be a big deal
@lastzero commented on GitHub (Jan 13, 2025):
@Dobeemixer We will be happy to take another look at this once the upgraded user interface and new viewer are released:
Of course, if you know a good solution (or would like to find one and build a proof-of-concept), we are also happy to accept contributions. This would shorten the time to get it into a stable release.
@Brtrnd commented on GitHub (Nov 19, 2025):
I'll be assuming this can be used to achieve OCR
https://github.com/photoprism/photoprism/discussions/4983
I guess one would ask the LLM if there is text in the image and return a binary answer. If postitive, a OCR engine could be released on the document.
Of course, OCR on unstructured documents is hard; so it's very probable it will read most of the words, but mess them up if there's formatting.
@lastzero commented on GitHub (Nov 20, 2025):
@Brtrnd Absolutely! LLMs can do this, and it generally works well — depending on the language of the text and the model. The only remaining issue is that you might prefer to put the text in a new, dedicated OCR metadata field rather than in the Caption. Feel free to suggest alternative/better field names, and let us know what else you need for this feature request to be resolved.