CLI: Provide a way to remove already indexed files from index #1018

Open
opened 2026-02-20 00:04:22 -05:00 by deekerman · 5 comments
Owner

Originally created by @rthomson83 on GitHub (Jul 8, 2021).

I've discovered what I believe to be an issue in the excluded files and folders after already being indexed. The only current way I can see is to reset the database to resolve. I've scoured the docs and cannot see anything that covers this scenario.

Expected

Adding a folder or file exclusion to ppignore should not only exclude from future index but purge any previews and folders from index.

Actual

Folder added to ppignore after an index has already run means the previews and folders are never removed.

Steps

  • Nextcloud folder mounted via Kubernetes volume mount
  • Run initial index
  • Add ppignore file in root of the mounted nextcloud folder
  • Add folder for exclusion to ppignore file
  • Re-run index and all the sub folders and previews that are part of the exclusion are still present in index.
Originally created by @rthomson83 on GitHub (Jul 8, 2021). I've discovered what I believe to be an issue in the excluded files and folders after already being indexed. The only current way I can see is to reset the database to resolve. I've scoured the docs and cannot see anything that covers this scenario. # Expected Adding a folder or file exclusion to ppignore should not only exclude from future index but purge any previews and folders from index. # Actual Folder added to ppignore after an index has already run means the previews and folders are never removed. # Steps - Nextcloud folder mounted via Kubernetes volume mount - Run initial index - Add ppignore file in root of the mounted nextcloud folder - Add folder for exclusion to ppignore file - Re-run index and all the sub folders and previews that are part of the exclusion are still present in index.
Author
Owner

@lastzero commented on GitHub (Jul 8, 2021):

There are photoprism purge and photoprism cleanup commands to refresh your index - however, they don't remove existing files matching (newly added) patterns in .gitignore. Same behaviour as with "git" (which uses .gitignore files), also to avoid accidental data loss if you got the pattern wrong so that it matches too much (or everything with *). We see that as a big risk.

USAGE:
   photoprism [global options] command [command options] [arguments...]

COMMANDS:
   start, up         Starts web server
   stop, down        Stops web server (only in daemon mode)
   index             Indexes media files in originals folder
   import, mv        Moves files to originals folder, converts and indexes them as needed
   moments           Creates albums based on popular locations, dates and labels
   optimize          Starts metadata check and optimization
   purge             Removes missing files from search results
   cleanup           Removes orphan index entries and thumbnails
   copy, cp          Copies files to originals folder, converts and indexes them as needed
   convert           Converts originals in other formats to JPEG and AVC sidecar files
   resample, thumbs  Pre-renders thumbnails (significantly reduces memory and cpu usage)
   migrate           Initializes the index database if needed
   backup            Creates album and index backups
   restore           Restores album and index backups
   reset             Resets the index and removes sidecar files after confirmation
   config            Displays global configuration values
   passwd            Changes the admin password
   version           Shows version information
   status            Performs a server health check
   help, h           Shows a list of commands or help for one command

Advanced users can delete entries from their index via standard SQL, similar to updating them as described here:

https://github.com/photoprism/photoprism/issues/1395#issuecomment-871617580

@lastzero commented on GitHub (Jul 8, 2021): There are `photoprism purge` and `photoprism cleanup` commands to refresh your index - however, they don't remove existing files matching (newly added) patterns in `.gitignore`. Same behaviour as with "git" (which uses `.gitignore` files), also to avoid accidental data loss if you got the pattern wrong so that it matches too much (or everything with `*`). We see that as a big risk. ``` USAGE: photoprism [global options] command [command options] [arguments...] COMMANDS: start, up Starts web server stop, down Stops web server (only in daemon mode) index Indexes media files in originals folder import, mv Moves files to originals folder, converts and indexes them as needed moments Creates albums based on popular locations, dates and labels optimize Starts metadata check and optimization purge Removes missing files from search results cleanup Removes orphan index entries and thumbnails copy, cp Copies files to originals folder, converts and indexes them as needed convert Converts originals in other formats to JPEG and AVC sidecar files resample, thumbs Pre-renders thumbnails (significantly reduces memory and cpu usage) migrate Initializes the index database if needed backup Creates album and index backups restore Restores album and index backups reset Resets the index and removes sidecar files after confirmation config Displays global configuration values passwd Changes the admin password version Shows version information status Performs a server health check help, h Shows a list of commands or help for one command ``` Advanced users can delete entries from their index via standard SQL, similar to updating them as described here: https://github.com/photoprism/photoprism/issues/1395#issuecomment-871617580
Author
Owner

@rthomson83 commented on GitHub (Jul 8, 2021):

I guess it's risky if you're actually deleting the original files but the way I'm thinking of, its just the actual index entries / previews.
That would be the equivalent in git terms of delete from index and keep local.
I mean I'm fine messing around with the SQL but it just felt like there ought to be a solution with a little less friction.

@rthomson83 commented on GitHub (Jul 8, 2021): I guess it's risky if you're actually deleting the original files but the way I'm thinking of, its just the actual index entries / previews. That would be the equivalent in git terms of delete from index and keep local. I mean I'm fine messing around with the SQL but it just felt like there ought to be a solution with a little less friction.
Author
Owner

@lastzero commented on GitHub (Jul 19, 2021):

In the worst case you have to reindex everything, which may take days on a Raspberry Pi. Also you may lose metadata stored in your index only. We can look into this when we are done with high priority feature requests such as facial recognition and batch edit.

@lastzero commented on GitHub (Jul 19, 2021): In the worst case you have to reindex everything, which may take days on a Raspberry Pi. Also you may lose metadata stored in your index only. We can look into this when we are done with high priority feature requests such as facial recognition and batch edit.
Author
Owner

@charles-997 commented on GitHub (Sep 23, 2021):

The only current way I can see is to reset the database to resolve

Do you mean performing a "complete rescan" i.e. re-index the whole database?
Or something else. Although not ideal, I don't mind having to re-index my whole database.

I guess it's risky if you're actually deleting the original files but the way I'm thinking of, its just the actual index entries / previews.
I would agree that removing existing files (from photoprism, not the originals disk storage) based upon any updates to .ppignore file(s) during a standard index is not actually a risky procedure. If the originals still exist on the disk than I believe no data would actually be lost? Please correct me if I am wrong.

That being said, I do agree with @lastzero that the best method would be to create a new command similar to photoprism purge and photoprism cleanup that will update the library based on any changes/additions to .ppignore file(s).

@charles-997 commented on GitHub (Sep 23, 2021): > The only current way I can see is to reset the database to resolve Do you mean performing a "complete rescan" i.e. re-index the whole database? Or something else. Although not ideal, I don't mind having to re-index my whole database. > I guess it's risky if you're actually deleting the original files but the way I'm thinking of, its just the actual index entries / previews. I would agree that removing existing files (from photoprism, not the originals disk storage) based upon any updates to .ppignore file(s) during a standard index is not actually a risky procedure. If the originals still exist on the disk than I believe no data would actually be lost? Please correct me if I am wrong. That being said, I do agree with @lastzero that the best method would be to create a new command similar to `photoprism purge` and `photoprism cleanup` that will update the library based on any changes/additions to .ppignore file(s).
Author
Owner

@ndfan77 commented on GitHub (Jun 16, 2024):

Just wanted to chime in that IMHO this should NOT be low priority, and should be at the same priority as face recognition, etc.

It's just as important not to expose content/folders that shouldn't have been visible to begin with, as it is to have good face recognition results, etc.

How could it be considered practical or acceptable to expect a user to make a full trip through the full Originals folder contents looking for files/folders that shouldn't be exposed before the first index! And then not have an easy way to correct what shouldn't be exposed after the first index, without losing all UI/metadata changes and starting completely over?! (That's just almost crazy! :)

@ndfan77 commented on GitHub (Jun 16, 2024): Just wanted to chime in that IMHO this should NOT be low priority, and should be at the same priority as face recognition, etc. It's just as important not to expose content/folders that shouldn't have been visible to begin with, as it is to have good face recognition results, etc. How could it be considered practical or acceptable to expect a user to make a full trip through the full Originals folder contents looking for files/folders that shouldn't be exposed before the first index! And then not have an easy way to correct what shouldn't be exposed after the first index, without losing all UI/metadata changes and starting completely over?! (That's just almost crazy! :)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/photoprism#1018
No description provided.