Fail to start immich-machine-learning CUDA #5323

Closed
opened 2026-02-20 03:17:42 -05:00 by deekerman · 0 comments
Owner

Originally created by @SaltySaltPie on GitHub (Apr 7, 2025).

I have searched the existing issues, both open and closed, to make sure this is not a duplicate report.

  • Yes

The bug

System: Ubuntu 2404LTS
CPU ryzen 3400g, Ram 64Gb, RTX 3080

Issue:
sudo docker compose up -d
[+] Running 21/21
✔ immich-machine-learning Pulled 98.7s
[+] Running 3/4
✔ Container immich_redis Running 0.0s
✔ Container immich_postgres Running 0.0s
⠏ Container immich_machine_learning Startin... 5.9s
✔ Container immich_server Running 0.0s
Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown

The OS that Immich Server is running on

Ubuntu 24.04 LTS

Version of Immich Server

v1.131.3

Version of Immich Mobile App

Platform with the issue

  • Server
  • Web
  • Mobile

Your docker-compose.yml content

#
# WARNING: Make sure to use the docker-compose.yml of the current release:
#
# https://github.com/immich-app/immich/releases/latest/download/docker-compose.yml
#
# The compose file on main may not be compatible with the latest release.
#

name: immich

services:
    immich-server:
        container_name: immich_server
        image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
        # extends:
        #   file: hwaccel.transcoding.yml
        #   service: cpu # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding
        volumes:
            # Do not edit the next line. If you want to change the media storage location on your system, edit the value of UPLOAD_LOCATION in the .env file
            - ${UPLOAD_LOCATION}:/usr/src/app/upload
            - /media/jb/Main HDD/external/jim-photos/old-phones:/usr/src/app/external/jim/old-phones
            - /media/jb/Main HDD/external/jim-photos/wedding:/usr/src/app/external/jim/wedding
            - /media/jb/Main HDD/external/jane-photos:/usr/src/app/external/jane
            - /etc/localtime:/etc/localtime:ro
        env_file:
            - .env
        ports:
            - "2283:2283"
        depends_on:
            - redis
            - database
        restart: always
        healthcheck:
            disable: false

    immich-machine-learning:
        container_name: immich_machine_learning
        # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
        # Example tag: ${IMMICH_VERSION:-release}-cuda
        image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-cuda
        extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration
          file: hwaccel.ml.yml
          service: cuda # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
        volumes:
            - model-cache:/cache
        env_file:
            - .env
        ports:
            - 3003:3003
        restart: always
        healthcheck:
            disable: false

    redis:
        container_name: immich_redis
        image: docker.io/redis:6.2-alpine@sha256:2ba50e1ac3a0ea17b736ce9db2b0a9f6f8b85d4c27d5f5accc6a416d8f42c6d5
        healthcheck:
            test: redis-cli ping || exit 1
        restart: always

    database:
        container_name: immich_postgres
        image: docker.io/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0
        environment:
            POSTGRES_PASSWORD: ${DB_PASSWORD}
            POSTGRES_USER: ${DB_USERNAME}
            POSTGRES_DB: ${DB_DATABASE_NAME}
            POSTGRES_INITDB_ARGS: "--data-checksums"
        volumes:
            # Do not edit the next line. If you want to change the database storage location on your system, edit the value of DB_DATA_LOCATION in the .env file
            - ${DB_DATA_LOCATION}:/var/lib/postgresql/data
        healthcheck:
            test: pg_isready --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' || exit 1; Chksum="$$(psql --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' --tuples-only --no-align --command='SELECT COALESCE(SUM(checksum_failures), 0) FROM pg_stat_database')"; echo "checksum failure count is $$Chksum"; [ "$$Chksum" = '0' ] || exit 1
            interval: 5m
            start_interval: 30s
            start_period: 5m
        command:
            [
                "postgres",
                "-c",
                "shared_preload_libraries=vectors.so",
                "-c",
                'search_path="$$user", public, vectors',
                "-c",
                "logging_collector=on",
                "-c",
                "max_wal_size=2GB",
                "-c",
                "shared_buffers=512MB",
                "-c",
                "wal_compression=on",
            ]
        restart: always

volumes:
    model-cache:

Your .env content

# You can find documentation for all the supported env variables at https://immich.app/docs/install/environment-variables

# The location where your uploaded files are stored
# UPLOAD_LOCATION=./library
UPLOAD_LOCATION=/media/jb/Main HDD/immich/upload
# The location where your database files are stored
# DB_DATA_LOCATION=./postgres
DB_DATA_LOCATION=/media/jb/Main HDD/immich/db_data

# To set a timezone, uncomment the next line and change Etc/UTC to a TZ identifier from this list: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List
TZ=Asia/Bangkok

# The Immich version to use. You can pin this to a specific version like "v1.71.0"
IMMICH_VERSION=v1.131.3

# Connection secret for postgres. You should change it to a random password
# Please use only the characters `A-Za-z0-9`, without special characters or spaces
DB_PASSWORD=aSrE2UvvEHr9d7

# The values below this line do not need to be changed
###################################################################################
DB_USERNAME=postgres
DB_DATABASE_NAME=immich

Reproduction steps

  1. install Ubuntu 24.04 (with NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 )
  2. install nvidia container toolkit
  3. docker compose up -d
    ...
    machine learning failed with error
    Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
    nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown

Relevant log output


Additional information

Thanks for awesome app. recently switch to linux Ubuntu from windows11, so not sure what i missed here.
On windows 11, app worked flawlessly
When i switch to ubuntu, i basically reuse the same docker compose and ml files with the path to data edited to match the new linux path.

Originally created by @SaltySaltPie on GitHub (Apr 7, 2025). ### I have searched the existing issues, both open and closed, to make sure this is not a duplicate report. - [x] Yes ### The bug System: Ubuntu 2404LTS CPU ryzen 3400g, Ram 64Gb, RTX 3080 Issue: sudo docker compose up -d [+] Running 21/21 ✔ immich-machine-learning Pulled 98.7s [+] Running 3/4 ✔ Container immich_redis Running 0.0s ✔ Container immich_postgres Running 0.0s ⠏ Container immich_machine_learning Startin... 5.9s ✔ Container immich_server Running 0.0s Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown ### The OS that Immich Server is running on Ubuntu 24.04 LTS ### Version of Immich Server v1.131.3 ### Version of Immich Mobile App - ### Platform with the issue - [x] Server - [ ] Web - [ ] Mobile ### Your docker-compose.yml content ```YAML # # WARNING: Make sure to use the docker-compose.yml of the current release: # # https://github.com/immich-app/immich/releases/latest/download/docker-compose.yml # # The compose file on main may not be compatible with the latest release. # name: immich services: immich-server: container_name: immich_server image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release} # extends: # file: hwaccel.transcoding.yml # service: cpu # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding volumes: # Do not edit the next line. If you want to change the media storage location on your system, edit the value of UPLOAD_LOCATION in the .env file - ${UPLOAD_LOCATION}:/usr/src/app/upload - /media/jb/Main HDD/external/jim-photos/old-phones:/usr/src/app/external/jim/old-phones - /media/jb/Main HDD/external/jim-photos/wedding:/usr/src/app/external/jim/wedding - /media/jb/Main HDD/external/jane-photos:/usr/src/app/external/jane - /etc/localtime:/etc/localtime:ro env_file: - .env ports: - "2283:2283" depends_on: - redis - database restart: always healthcheck: disable: false immich-machine-learning: container_name: immich_machine_learning # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag. # Example tag: ${IMMICH_VERSION:-release}-cuda image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-cuda extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration file: hwaccel.ml.yml service: cuda # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable volumes: - model-cache:/cache env_file: - .env ports: - 3003:3003 restart: always healthcheck: disable: false redis: container_name: immich_redis image: docker.io/redis:6.2-alpine@sha256:2ba50e1ac3a0ea17b736ce9db2b0a9f6f8b85d4c27d5f5accc6a416d8f42c6d5 healthcheck: test: redis-cli ping || exit 1 restart: always database: container_name: immich_postgres image: docker.io/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0 environment: POSTGRES_PASSWORD: ${DB_PASSWORD} POSTGRES_USER: ${DB_USERNAME} POSTGRES_DB: ${DB_DATABASE_NAME} POSTGRES_INITDB_ARGS: "--data-checksums" volumes: # Do not edit the next line. If you want to change the database storage location on your system, edit the value of DB_DATA_LOCATION in the .env file - ${DB_DATA_LOCATION}:/var/lib/postgresql/data healthcheck: test: pg_isready --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' || exit 1; Chksum="$$(psql --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' --tuples-only --no-align --command='SELECT COALESCE(SUM(checksum_failures), 0) FROM pg_stat_database')"; echo "checksum failure count is $$Chksum"; [ "$$Chksum" = '0' ] || exit 1 interval: 5m start_interval: 30s start_period: 5m command: [ "postgres", "-c", "shared_preload_libraries=vectors.so", "-c", 'search_path="$$user", public, vectors', "-c", "logging_collector=on", "-c", "max_wal_size=2GB", "-c", "shared_buffers=512MB", "-c", "wal_compression=on", ] restart: always volumes: model-cache: ``` ### Your .env content ```Shell # You can find documentation for all the supported env variables at https://immich.app/docs/install/environment-variables # The location where your uploaded files are stored # UPLOAD_LOCATION=./library UPLOAD_LOCATION=/media/jb/Main HDD/immich/upload # The location where your database files are stored # DB_DATA_LOCATION=./postgres DB_DATA_LOCATION=/media/jb/Main HDD/immich/db_data # To set a timezone, uncomment the next line and change Etc/UTC to a TZ identifier from this list: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List TZ=Asia/Bangkok # The Immich version to use. You can pin this to a specific version like "v1.71.0" IMMICH_VERSION=v1.131.3 # Connection secret for postgres. You should change it to a random password # Please use only the characters `A-Za-z0-9`, without special characters or spaces DB_PASSWORD=aSrE2UvvEHr9d7 # The values below this line do not need to be changed ################################################################################### DB_USERNAME=postgres DB_DATABASE_NAME=immich ``` ### Reproduction steps 1. install Ubuntu 24.04 (with NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 ) 2. install nvidia container toolkit 3. docker compose up -d ... machine learning failed with error Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown ### Relevant log output ```shell ``` ### Additional information Thanks for awesome app. recently switch to linux Ubuntu from windows11, so not sure what i missed here. On windows 11, app worked flawlessly When i switch to ubuntu, i basically reuse the same docker compose and ml files with the path to data edited to match the new linux path.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/immich#5323
No description provided.