Documents: Add support for indexing and searching PDFs #2206

Closed
opened 2026-02-20 01:07:56 -05:00 by deekerman · 2 comments
Owner

Originally created by @lastzero on GitHub (Oct 31, 2024).

Originally assigned to: @graciousgrey on GitHub.

As a user with PDF documents in my library, I would like PhotoPrism to recognise and index them so that I can find and view them from the web interface.

  • The first step is to add a new "document" media type to our backend. Additional changes will be required to find and index PDF files when scanning the library.
  • Once the frontend component framework upgrade is complete, we can proceed with implementing a viewer (e.g. an iframe) to view the indexed documents from the web interface.
  • Making the text searchable is beyond the scope of this issue, but can be implemented at a later time.

Sample Files:

Originally created by @lastzero on GitHub (Oct 31, 2024). Originally assigned to: @graciousgrey on GitHub. **As a user with PDF documents in my library, I would like PhotoPrism to recognise and index them so that I can find and view them from the web interface.** - The first step is to add a new "document" media type to our backend. Additional changes will be required to find and index PDF files when scanning the library. - Once the [frontend component framework upgrade](https://github.com/photoprism/photoprism/issues/3168) is complete, we can proceed with implementing a viewer (e.g. an iframe) to view the indexed documents from the web interface. - Making the text searchable is beyond the scope of this issue, but can be implemented at a later time. **Sample Files:** - https://dl.photoprism.app/samples/Formats/Document/PDF/
Author
Owner

@lastzero commented on GitHub (Mar 12, 2025):

I'm happy to announce that PDF indexing is now available for testing in our Development Preview!

However, before you proceed, please note the following:

  • PDF documents cannot be viewed directly in the UI with the current development version, as only a thumbnail of the first page is displayed in the viewer, i.e. you need to click the download button to view the files locally.
  • We may be able to improve this by rendering an iframe in the viewer that loads the PDF, but it's not entirely clear yet if this really makes sense from a UX perspective, e.g. some documents may be very large and would load slowly when you might not even want to read them.

Here's a screenshot of the search interface with the new documents filter applied:

Image

@lastzero commented on GitHub (Mar 12, 2025): **I'm happy to announce that PDF indexing is now available for testing in our [Development Preview](https://docs.photoprism.app/release-notes/#development-preview)!** However, before you proceed, please note the following: - PDF documents cannot be viewed directly in the UI with the current development version, as only a thumbnail of the first page is displayed in the viewer, i.e. you need to **click the download button** to view the files locally. - We may be able to improve this by rendering an iframe in the viewer that loads the PDF, but it's **not entirely clear yet if this really makes sense** from a UX perspective, e.g. some documents may be very large and would load slowly when you might not even want to read them. **Here's a screenshot of the search interface with the new `documents` filter applied:** ![Image](https://github.com/user-attachments/assets/cb8f9946-bb08-4f82-b1e7-6b67a4a3ac0d)
Author
Owner

@lastzero commented on GitHub (Mar 12, 2025):

Based on my tests, PDF indexing should also work in our Community Edition if you're using the official Docker image, so I've removed the plus-feature label. Of course, this is without any guarantee or personal support in case of problems. Help with testing would be much appreciated! ❤

@lastzero commented on GitHub (Mar 12, 2025): Based on my tests, PDF indexing should also work in our Community Edition if you're using the official Docker image, so I've removed the plus-feature label. Of course, this is without any guarantee or personal support in case of problems. Help with testing would be much appreciated! ❤
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/photoprism#2206
No description provided.