Indexer: Stand alone indexer #822

Author

Owner

@lastzero commented on GitHub (Feb 26, 2021):

Well, indexing and transcoding media files requires more than a small script. Installing all dependencies manually is even more work and needs the same amount of storage.

@lastzero commented on GitHub (Feb 26, 2021): Well, indexing and transcoding media files requires more than a small script. Installing all dependencies manually is even more work and needs the same amount of storage.

deekerman commented

Author

Owner

@lastzero commented on GitHub (Mar 1, 2021):

Is this really a feature request? What specifically should be developed and how?

@lastzero commented on GitHub (Mar 1, 2021): Is this really a feature request? What specifically should be developed and how?

deekerman commented

Author

Owner

@darrepac commented on GitHub (Mar 1, 2021):

It is not a bug, it is a new feature for me, so that's why I put as feature request.
Then, I guess that now the goal is to see what others think about it: if it makes sense only to me, we can conclude that it doesn't make really sense for the community.

@darrepac commented on GitHub (Mar 1, 2021): It is not a bug, it is a new feature for me, so that's why I put as feature request. Then, I guess that now the goal is to see what others think about it: if it makes sense only to me, we can conclude that it doesn't make really sense for the community.

deekerman commented

Author

Owner

@darrepac commented on GitHub (Mar 8, 2021):

I have pictures and storage folders mounted on a NAS and so shared accross my network.
If I install Photoprism on Ubuntu in computer A (i5 laptop so x86) and run a full scan.
Can I just run Photoprism on computer B (Raspberry, so Arm) by pointing to the same pictures and storage mount point?? or such files are platform dependant?

@darrepac commented on GitHub (Mar 8, 2021): I have pictures and storage folders mounted on a NAS and so shared accross my network. If I install Photoprism on Ubuntu in computer A (i5 laptop so x86) and run a full scan. Can I just run Photoprism on computer B (Raspberry, so Arm) by pointing to the same pictures and storage mount point?? or such files are platform dependant?

deekerman commented

Author

Owner

@cesarblancg commented on GitHub (Mar 19, 2021):

I have pictures and storage folders mounted on a NAS and so shared accross my network.
If I install Photoprism on Ubuntu in computer A (i5 laptop so x86) and run a full scan.
Can I just run Photoprism on computer B (Raspberry, so Arm) by pointing to the same pictures and storage mount point?? or such files are platform dependant?

I have the same problem.

I have a rpi3b with syncthing and other self+hosted solutions and I'm searching a google photos alternative. I'm thinking about two instances of photoprism( one in my rpi and the other in my pc). The rpi will have the mariadb server and an instance of photoprism, and my pc will have the other photoprism with the originals folder under syncthing and connecting to the rpi mariadb server.

Could I connect the mobile app to the rpi photoprism and upload my media, and don't be indexed until I turn on the pc?

@cesarblancg commented on GitHub (Mar 19, 2021): > I have pictures and storage folders mounted on a NAS and so shared accross my network. > If I install Photoprism on Ubuntu in computer A (i5 laptop so x86) and run a full scan. > Can I just run Photoprism on computer B (Raspberry, so Arm) by pointing to the same pictures and storage mount point?? or such files are platform dependant? I have the same problem. I have a rpi3b with syncthing and other self+hosted solutions and I'm searching a google photos alternative. I'm thinking about two instances of photoprism( one in my rpi and the other in my pc). The rpi will have the mariadb server and an instance of photoprism, and my pc will have the other photoprism with the originals folder under syncthing and connecting to the rpi mariadb server. Could I connect the mobile app to the rpi photoprism and upload my media, and don't be indexed until I turn on the pc?

deekerman commented

Author

Owner

@lastzero commented on GitHub (Mar 19, 2021):

In theory yes, although that's a pretty complex setup and the Pi is perfectly capable of indexing a few files you just uploaded.

@lastzero commented on GitHub (Mar 19, 2021): In theory yes, although that's a pretty complex setup and the Pi is perfectly capable of indexing a few files you just uploaded.

deekerman commented

Author

Owner

@sigaloid commented on GitHub (Dec 11, 2021):

I think a short tutorial on how to index with a stronger device would be helpful. I have a nice ryzen and 16GB of ram on my main PC but I want to eventually toss it on a pi and not think about it. The initial indexing is a serious burden for a lot of photos + facial recognition etc. Importing a few hundred when I sync shouldn't be too hard but I have thousands of photos to be imported on first run onto a 1GB ram Pi 2.

@sigaloid commented on GitHub (Dec 11, 2021): I think a short tutorial on how to index with a stronger device would be helpful. I have a nice ryzen and 16GB of ram on my main PC but I want to eventually toss it on a pi and not think about it. The initial indexing is a serious burden for a lot of photos + facial recognition etc. Importing a few hundred when I sync shouldn't be too hard but I have thousands of photos to be imported on first run onto a 1GB ram Pi 2.

deekerman commented

Author

Owner

@alexislefebvre commented on GitHub (Dec 17, 2021):

I was able to launch a PhotoPrism container on another computer by following these steps.

Let's say that you have 2 computers, the server is the one that already run PhotoPrism, we'll use our local computer as a client to run PhotoPrism.

⚠️ Prerequisites:

you have Docker on both computers, you start PhotoPrism with docker-compose
the folders mounted in the photoprism container are mounted on your other computer (I use Samba for this):
- the folder(s) with your media content
- the storage folder
the storage folder is writable by the client too
you understand every step below

On your server:

Do a backup just in case
Stop the photoprism container (we don't want parallel accesses to the same database)
Open the port for the database by updating docker-compose.yml, my local network is on the 192.168.1.X range so I use it to not open the port to Internet:
```
  photoprism-db:
    # …
    ports:
      - 192.168.1.6:13306:3306
```
Restart the photoprism-db container: docker-compose up --detach photoprism-db

On your client:

We have to run PhotoPrism and it needs to access to the same paths than when it's running on the server.

Copy the docker-compose.yml file, keep only the photoprism service
Remove the network
Update the PHOTOPRISM_DATABASE_DSN variable to use the database on the server
Update the volumes:
- the parts before : has to be updated and correspond to the paths that were mounted in the container
- the part after : is kept as is

You should have something like this:

version: '3.7'

services:
  photoprism:
    # …
    environment:
      # …
      PHOTOPRISM_DATABASE_DSN: "photoprism:photoprism@tcp(192.168.1.6:13306)/photoprism?charset=utf8mb4,utf8&parseTime=true"
    volumes:
      - "/mnt/nas/photos/:/photoprism/originals/"
      - "/mnt/nas/photoprism/storage/:/photoprism/storage" # keep database files

Start the photoprism container: docker-compose up --detach photoprism.

Access to the PhotoPrism instance through the port you defined. You should see the same media than before.

I didn't do any benchmark, but the network may be the bottleneck with this setup.

@alexislefebvre commented on GitHub (Dec 17, 2021): I was able to launch a PhotoPrism container on another computer by following these steps. Let's say that you have 2 computers, the **server** is the one that already run PhotoPrism, we'll use our local computer as a **client** to run PhotoPrism. ## :warning: Prerequisites: - you have Docker on both computers, you start PhotoPrism with `docker-compose` - the folders mounted in the photoprism container are mounted on your other computer (I use Samba for this): - the folder(s) with your media content - the `storage` folder - the storage folder is writable by the client too - you understand every step below ## On your server: 1. Do a backup just in case 2. Stop the `photoprism` container (we don't want parallel accesses to the same database) 3. Open the port for the database by updating `docker-compose.yml`, my local network is on the 192.168.1.X range so I use it to not open the port to Internet: ```yaml photoprism-db: # … ports: - 192.168.1.6:13306:3306 ``` 4. Restart the `photoprism-db` container: `docker-compose up --detach photoprism-db` ## On your client: We have to run PhotoPrism and it needs to access to the same paths than when it's running on the server. 1. Copy the `docker-compose.yml` file, keep only the `photoprism` service 2. Remove the network 3. Update the `PHOTOPRISM_DATABASE_DSN` variable to use the database on the server 4. Update the `volumes`: - the parts before `:` has to be updated and correspond to the paths that were mounted in the container - the part after `:` is kept as is You should have something like this: ```yaml version: '3.7' services: photoprism: # … environment: # … PHOTOPRISM_DATABASE_DSN: "photoprism:photoprism@tcp(192.168.1.6:13306)/photoprism?charset=utf8mb4,utf8&parseTime=true" volumes: - "/mnt/nas/photos/:/photoprism/originals/" - "/mnt/nas/photoprism/storage/:/photoprism/storage" # keep database files ``` Start the `photoprism` container: `docker-compose up --detach photoprism`. Access to the PhotoPrism instance through the port you defined. You should see the same media than before. --- I didn't do any benchmark, but the network may be the bottleneck with this setup.

deekerman commented

Author

Owner

@laterwet commented on GitHub (Jan 1, 2022):

The issue is not only for initial indexing, but also with complete rescan (re-indexing). Ideally, the indexer would support distributed computation, so we could register all available machines. This could be done with a simplified docker image that only creates thumbnails and runs classification on per-image task. The workers could communicate with the server via a simple API and queue service.

I have a photo library of 43k+ photos and it would take about 4 days for initial indexing. Using multiple PCs could reduce time significantly with a fast network.

I have good experience in distributed programming, I could possibly contribute to this kind of feature if the interest is high.

@laterwet commented on GitHub (Jan 1, 2022): The issue is not only for initial indexing, but also with complete rescan (re-indexing). Ideally, the indexer would support distributed computation, so we could register all available machines. This could be done with a simplified docker image that only creates thumbnails and runs classification on per-image task. The workers could communicate with the server via a simple API and queue service. I have a photo library of 43k+ photos and it would take about 4 days for initial indexing. Using multiple PCs could reduce time significantly with a fast network. I have good experience in distributed programming, I could possibly contribute to this kind of feature if the interest is high.

deekerman commented

Author

Owner

@lastzero commented on GitHub (Jan 2, 2022):

Our 2-core Intel Core i3 can index 100k+ files in less than a day. I would recommend buying better hardware instead of requiring much more complex (and expensive) software architecture to solve performance problems with small libraries. Development time is our scarcest resource.

@lastzero commented on GitHub (Jan 2, 2022): Our 2-core Intel Core i3 can index 100k+ files in less than a day. I would recommend buying better hardware instead of requiring much more complex (and expensive) software architecture to solve performance problems with small libraries. Development time is our scarcest resource.

deekerman commented

Author

Owner

@lastzero commented on GitHub (Jan 2, 2022):

By the way, you can already provide the index command with different subfolders as argument and run it on different computers. The only drawback is that the other computer has to access storage over the network, which can slow it down considerably. Overall, I don't think it's worth the effort for most users.

@lastzero commented on GitHub (Jan 2, 2022): By the way, you can already provide the index command with different subfolders as argument and run it on different computers. The only drawback is that the other computer has to access storage over the network, which can slow it down considerably. Overall, I don't think it's worth the effort for most users.

deekerman commented

Author

Owner

@gadolf66 commented on GitHub (Jan 19, 2022):

Hello, new to the forum, so, if you feel it's better to create a new thread, just let me know.
I want to work on a setup similar to others:
server 1 - Debian - has four usb drives to store the images, shared with the private network (Samba)
server 2 - Debian - Photoprism

I wonder if it's really necessary to install Photoprism on both machines, or just adding a volume on Photoprism server (server 2), pointing to the network share on server1.

Any thoughts?

Thanks in advance.

@gadolf66 commented on GitHub (Jan 19, 2022): Hello, new to the forum, so, if you feel it's better to create a new thread, just let me know. I want to work on a setup similar to others: server 1 - Debian - has four usb drives to store the images, shared with the private network (Samba) server 2 - Debian - Photoprism I wonder if it's really necessary to install Photoprism on both machines, or just adding a volume on Photoprism server (server 2), pointing to the network share on server1. Any thoughts? Thanks in advance.

deekerman commented

Author

Owner

@lastzero commented on GitHub (Jan 19, 2022):

Local data storage is a big advantage / necessity when it comes to performance and data security:

If necessary, only originals should be stored on a network drive or a slow HDD. If you want to discuss this in more detail, it's best to start a thread (or join an existing one) in GitHub Discussions.

@lastzero commented on GitHub (Jan 19, 2022): Local data storage is a big advantage / necessity when it comes to performance and data security: - https://docs.photoprism.app/getting-started/troubleshooting/performance/#storage - https://docs.photoprism.app/getting-started/troubleshooting/mariadb/#corrupted-files If necessary, only *originals* should be stored on a network drive or a slow HDD. If you want to discuss this in more detail, it's best to start a thread (or join an existing one) in GitHub Discussions.

deekerman commented

Author

Owner

@gadolf66 commented on GitHub (Jan 19, 2022):

@lastzero Thanks for quick reply.

Yes, I was thinking only about originals. All other files would be local.

Now, the corrupted-files link you've provided says:

If your database files get corrupted frequently, it is usually because they are stored on an unreliable device such as a USB flash drive, an SD card, or a shared network folder.

Just to make sure I correctly understood it (although it's quite obvious), by database files, you mean the database inner files, like the ones below, not the originals, right?

@gadolf66 commented on GitHub (Jan 19, 2022): @lastzero Thanks for quick reply. Yes, I was thinking only about _originals_. All other files would be local. Now, the _corrupted-files_ link you've provided says: > If your database files get corrupted frequently, it is usually because they are stored on an unreliable device such as a USB flash drive, an SD card, or a shared network folder. Just to make sure I correctly understood it (although it's quite obvious), by _database files_, you mean the database _inner_ files, like the ones below, not the _originals_, right? ![image](https://user-images.githubusercontent.com/46730249/150184617-2384049b-a502-46c8-a8da-3edd1df8bf99.png)

deekerman commented

Author

Owner

@mattodinaturas commented on GitHub (Feb 13, 2022):

Our 2-core Intel Core i3 can index 100k+ files in less than a day. I would recommend buying better hardware instead of requiring much more complex (and expensive) software architecture to solve performance problems with small libraries. Development time is our scarcest resource.

well, I have a 2012 i5-3330 and it takes forever to index, for a folder of 50gb it takes a day or so. Definitely not your speed, if i were to rescan all 50k files i have now, it would take at least one week and a half of 24/7 work. I have now moved my storage folder and redone docker compose but now it started indexing all over and i don't have the time to wait. Is there anything i can do? i am running on docker hyper-v on windows server 2022. Thanks

@mattodinaturas commented on GitHub (Feb 13, 2022): > Our 2-core Intel Core i3 can index 100k+ files in less than a day. I would recommend buying better hardware instead of requiring much more complex (and expensive) software architecture to solve performance problems with small libraries. Development time is our scarcest resource. well, I have a 2012 i5-3330 and it takes forever to index, for a folder of 50gb it takes a day or so. Definitely not your speed, if i were to rescan all 50k files i have now, it would take at least one week and a half of 24/7 work. I have now moved my storage folder and redone docker compose but now it started indexing all over and i don't have the time to wait. Is there anything i can do? i am running on docker hyper-v on windows server 2022. Thanks

deekerman commented

Author

Owner

@lastzero commented on GitHub (Feb 13, 2022):

Could be caused by file system virtualization. I know that accessing files on the host is extremely slow on macOS, so I wouldn't be surprised if Docker has similar issues on Windows.

With this background information, you can search Google for workarounds... others surely have similar performance problems. For example, there could be mount parameters to improve performance.

@lastzero commented on GitHub (Feb 13, 2022): Could be caused by file system virtualization. I know that accessing files on the host is extremely slow on macOS, so I wouldn't be surprised if Docker has similar issues on Windows. With this background information, you can search Google for workarounds... others surely have similar performance problems. For example, there could be mount parameters to improve performance.

deekerman commented

Author

Owner

@mattodinaturas commented on GitHub (Feb 13, 2022):

Could be caused by file system virtualization. I know that accessing files on the host is extremely slow on macOS, so I wouldn't be surprised if Docker has similar issues on Windows.

With this background information, you can search Google for workarounds... others surely have similar performance problems. For example, there could be mount parameters to improve performance.

thank you for your reply. I think you are right, since when I deployed nextcloud on a host folder it became really sluggish, the only way to fix it was to put every system folder of NC on a dedicated docker volume. Well, it's a bummer. I tried researching about this issue just now but most info is related to wsl rather than hyper-v. However if I use wsl2 is it right that I will not be able to access drivers other than C, as said in photo prism documentation or is there a workaround available? thank you

@mattodinaturas commented on GitHub (Feb 13, 2022): > Could be caused by file system virtualization. I know that accessing files on the host is extremely slow on macOS, so I wouldn't be surprised if Docker has similar issues on Windows. > > With this background information, you can search Google for workarounds... others surely have similar performance problems. For example, there could be mount parameters to improve performance. thank you for your reply. I think you are right, since when I deployed nextcloud on a host folder it became really sluggish, the only way to fix it was to put every system folder of NC on a dedicated docker volume. Well, it's a bummer. I tried researching about this issue just now but most info is related to wsl rather than hyper-v. However if I use wsl2 is it right that I will not be able to access drivers other than C, as said in photo prism documentation or is there a workaround available? thank you

deekerman commented

Author

Owner

@lastzero commented on GitHub (Feb 13, 2022):

A user reported that mounting drives other than C: works with WSL2. However, we do not have details, and when we tested it months ago, it did not work. Maybe Microsoft has released an update in the meantime?

@lastzero commented on GitHub (Feb 13, 2022): A user reported that mounting drives other than C: works with WSL2. However, we do not have details, and when we tested it months ago, it did not work. Maybe Microsoft has released an update in the meantime?

deekerman commented

Author

Owner

@djjudas21 commented on GitHub (Jul 14, 2022):

I'm interested in this, but from a different perspective. I am running PhotoPrism in a 3-node Kubernetes cluster, and I am interested in multiple replicas to benefit from that compute during indexing.

Is it possible to run multiple replicas of PhotoPrism as-is, without breaking something? All the persistent data is in a Kubernetes persistent volume, and obviously the database is in a separate MariaDB container. Would running 2 replicas in parallel be smart enough to pick jobs from the work queue when indexing, and share the workload, or would they tread on each other's toes?
Ideally, I would like to see the frontend separated from the backend/indexer so I can run one replica of the frontend and N replicas of the indexer. I appreciate this is a big task, and maybe not that useful for people who are running in Docker. Maybe you could consider a single PhotoPrism image which accepts startup arguments to behave only as a frontend, or only as an indexer? Then Kubernetes users can specify their own architecture. Happy to raise this as a separate issue if you prefer. Thanks.

@djjudas21 commented on GitHub (Jul 14, 2022): I'm interested in this, but from a different perspective. I am running PhotoPrism in a 3-node Kubernetes cluster, and I am interested in multiple replicas to benefit from that compute during indexing. - Is it possible to run multiple replicas of PhotoPrism as-is, without breaking something? All the persistent data is in a Kubernetes persistent volume, and obviously the database is in a separate MariaDB container. Would running 2 replicas in parallel be smart enough to pick jobs from the work queue when indexing, and share the workload, or would they tread on each other's toes? - Ideally, I would like to see the frontend separated from the backend/indexer so I can run one replica of the frontend and N replicas of the indexer. I appreciate this is a big task, and maybe not that useful for people who are running in Docker. Maybe you could consider a single PhotoPrism image which accepts startup arguments to behave only as a frontend, or only as an indexer? Then Kubernetes users can specify their own architecture. Happy to raise this as a separate issue if you prefer. Thanks.

deekerman commented

Author

Owner

@centralhardware commented on GitHub (Jul 15, 2022):

you already can run index from CLI. something looks like sudo docker run --entrypoint photoprism index. it's not exectly what you want but it's can be done write now.

@centralhardware commented on GitHub (Jul 15, 2022): you already can run index from CLI. something looks like `sudo docker run --entrypoint photoprism index`. it's not exectly what you want but it's can be done write now.

deekerman commented