[ML]System malfunction while using RKNN. #5236

Closed
opened 2026-02-20 03:16:14 -05:00 by deekerman · 3 comments
Owner

Originally created by @raptr55 on GitHub (Mar 28, 2025).

I have searched the existing issues, both open and closed, to make sure this is not a duplicate report.

  • Yes

The bug

when something is searched in the search bar, an SDD disk is unmounted, blocking the system.

The OS that Immich Server is running on

Debian GNU/Linux 12 (bookworm)

Version of Immich Server

1.130.3

Version of Immich Mobile App

'

Platform with the issue

  • Server
  • Web
  • Mobile

Your docker-compose.yml content

#
# WARNING: Make sure to use the docker-compose.yml of the current release:
#
# https://github.com/immich-app/immich/releases/latest/download/docker-compose.yml
#
# The compose file on main may not be compatible with the latest release.
#

name: immich-app

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    extends:
      file: hwaccel.transcoding.yml
      service: rkmpp # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
    env_file:
      - immich-app.env
    ports:
      - 2283:2283
    depends_on:
      - redis
      - database
    restart: always

  immich-machine-learning:
    container_name: immich_machine_learning
    # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
    # Example tag: ${IMMICH_VERSION:-release}-cuda
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-rknn
    extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration
      file: hwaccel.ml.yml
      service: rknn # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
    volumes:
      - model-cache:/cache
    env_file:
      - immich-app.env
    restart: always

  redis:
    container_name: immich_redis
    image: docker.io/redis:6.2-alpine@sha256:d6c2911ac51b289db208767581a5d154544f2b2fe4914ea5056443f62dc6e900
    healthcheck:
      test: redis-cli ping || exit 1
    restart: always

  database:
    container_name: immich_postgres
    image: docker.io/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      POSTGRES_INITDB_ARGS: '--data-checksums'
    volumes:
      - ${DB_DATA_LOCATION}:/var/lib/postgresql/data
    healthcheck:
      test: pg_isready --dbname='${DB_DATABASE_NAME}' || exit 1; Chksum="$$(psql --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' --tuples-only --no-align --command='SELECT SUM(checksum_failures) FROM pg_stat_database')"; echo "checksum failure count is $$Chksum"; [ "$$Chksum" = '0' ] || exit 1
      interval: 5m
      start_interval: 30s
      start_period: 5m
    command: ["postgres", "-c" ,"shared_preload_libraries=vectors.so", "-c", 'search_path="$$user", public, vectors', "-c", "logging_collector=on", "-c", "max_wal_size=2GB", "-c", "shared_buffers=512MB", "-c", "wal_compression=on"]
    restart: always

volumes:
  model-cache:

Your .env content

MACHINE_LEARNING_RKNN_THREADS=3

Reproduction steps

  1. enable the hardware acceleration rknn as explained at the link
  2. initialize the machine learning job in the web page
  3. wait for completion
  4. search for something in the search bar

Relevant log output

[ 96.907096] BTRFS info (device nvme0n1): first mount of filesystem 8c0d426a-71f5-4764-92a3-b326334aeb92
[ 96.907182] BTRFS info (device nvme0n1): using crc32c (crc32c-generic) checksum algorithm
[ 96.907206] BTRFS info (device nvme0n1): enabling ssd optimizations
[ 96.907215] BTRFS info (device nvme0n1): using free space tree
[ 96.908282] BTRFS error (device nvme0n1): devid 2 uuid 99506759-8f4f-495f-80c4-38cfca213afc is missing
[ 96.908650] BTRFS error (device nvme0n1): failed to read the system array: -2
[ 96.909028] BTRFS error (device nvme0n1): open_ctree failed

Additional information

Here system info:
root@openmediavault:# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
root@openmediavault:# uname -a
Linux openmediavault 6.1.99 #32 SMP Mon Jan 20 15:50:32 CST 2025 aarch64 GNU/Linux
root@openmediavault:# cat /sys/kernel/debug/rknpu/version
RKNPU driver: v0.9.8
root@openmediavault:/usr/lib# dpkg -l | grep linux-image
ii linux-image-6.1.57 6.1.57-13 arm64 Linux kernel, version 6.1.57
ii linux-image-6.1.99 6.1.99-32 arm64 Linux kernel, version 6.1.

this is the log extracted with journalctl:
log.txt

problem already referenced in:
#17098

i'm not very familiar with in linux systems so i don't know exactly how i can be helpful in identifying the cause.

Originally created by @raptr55 on GitHub (Mar 28, 2025). ### I have searched the existing issues, both open and closed, to make sure this is not a duplicate report. - [x] Yes ### The bug when something is searched in the search bar, an SDD disk is unmounted, blocking the system. ### The OS that Immich Server is running on Debian GNU/Linux 12 (bookworm) ### Version of Immich Server 1.130.3 ### Version of Immich Mobile App ' ### Platform with the issue - [x] Server - [x] Web - [ ] Mobile ### Your docker-compose.yml content ```YAML # # WARNING: Make sure to use the docker-compose.yml of the current release: # # https://github.com/immich-app/immich/releases/latest/download/docker-compose.yml # # The compose file on main may not be compatible with the latest release. # name: immich-app services: immich-server: container_name: immich_server image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release} extends: file: hwaccel.transcoding.yml service: rkmpp # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding volumes: - ${UPLOAD_LOCATION}:/usr/src/app/upload - /etc/localtime:/etc/localtime:ro env_file: - immich-app.env ports: - 2283:2283 depends_on: - redis - database restart: always immich-machine-learning: container_name: immich_machine_learning # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag. # Example tag: ${IMMICH_VERSION:-release}-cuda image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-rknn extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration file: hwaccel.ml.yml service: rknn # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable volumes: - model-cache:/cache env_file: - immich-app.env restart: always redis: container_name: immich_redis image: docker.io/redis:6.2-alpine@sha256:d6c2911ac51b289db208767581a5d154544f2b2fe4914ea5056443f62dc6e900 healthcheck: test: redis-cli ping || exit 1 restart: always database: container_name: immich_postgres image: docker.io/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0 environment: POSTGRES_PASSWORD: ${DB_PASSWORD} POSTGRES_USER: ${DB_USERNAME} POSTGRES_DB: ${DB_DATABASE_NAME} POSTGRES_INITDB_ARGS: '--data-checksums' volumes: - ${DB_DATA_LOCATION}:/var/lib/postgresql/data healthcheck: test: pg_isready --dbname='${DB_DATABASE_NAME}' || exit 1; Chksum="$$(psql --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' --tuples-only --no-align --command='SELECT SUM(checksum_failures) FROM pg_stat_database')"; echo "checksum failure count is $$Chksum"; [ "$$Chksum" = '0' ] || exit 1 interval: 5m start_interval: 30s start_period: 5m command: ["postgres", "-c" ,"shared_preload_libraries=vectors.so", "-c", 'search_path="$$user", public, vectors', "-c", "logging_collector=on", "-c", "max_wal_size=2GB", "-c", "shared_buffers=512MB", "-c", "wal_compression=on"] restart: always volumes: model-cache: ``` ### Your .env content ```Shell MACHINE_LEARNING_RKNN_THREADS=3 ``` ### Reproduction steps 1. enable the hardware acceleration rknn as explained at the [link](https://v1.130.3.archive.immich.app/docs/features/ml-hardware-acceleration/) 2. initialize the machine learning job in the web page 3. wait for completion 4. search for something in the search bar ### Relevant log output ```shell [ 96.907096] BTRFS info (device nvme0n1): first mount of filesystem 8c0d426a-71f5-4764-92a3-b326334aeb92 [ 96.907182] BTRFS info (device nvme0n1): using crc32c (crc32c-generic) checksum algorithm [ 96.907206] BTRFS info (device nvme0n1): enabling ssd optimizations [ 96.907215] BTRFS info (device nvme0n1): using free space tree [ 96.908282] BTRFS error (device nvme0n1): devid 2 uuid 99506759-8f4f-495f-80c4-38cfca213afc is missing [ 96.908650] BTRFS error (device nvme0n1): failed to read the system array: -2 [ 96.909028] BTRFS error (device nvme0n1): open_ctree failed ``` ### Additional information Here system info: root@openmediavault:# cat /etc/os-release PRETTY_NAME="Debian GNU/Linux 12 (bookworm)" NAME="Debian GNU/Linux" VERSION_ID="12" VERSION="12 (bookworm)" VERSION_CODENAME=bookworm ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/" root@openmediavault:# uname -a Linux openmediavault 6.1.99 #32 SMP Mon Jan 20 15:50:32 CST 2025 aarch64 GNU/Linux root@openmediavault:# cat /sys/kernel/debug/rknpu/version RKNPU driver: v0.9.8 root@openmediavault:/usr/lib# dpkg -l | grep linux-image ii linux-image-6.1.57 6.1.57-13 arm64 Linux kernel, version 6.1.57 ii linux-image-6.1.99 6.1.99-32 arm64 Linux kernel, version 6.1. this is the log extracted with journalctl: [log.txt](https://github.com/user-attachments/files/19508160/log.txt) problem already referenced in: #17098 i'm not very familiar with in linux systems so i don't know exactly how i can be helpful in identifying the cause.
Author
Owner

@alextran1502 commented on GitHub (Mar 28, 2025):

Did you try the search without hardware accel and confirm that this behavior doesn't happen?

@alextran1502 commented on GitHub (Mar 28, 2025): Did you try the search without hardware accel and confirm that this behavior doesn't happen?
Author
Owner

@raptr55 commented on GitHub (Mar 28, 2025):

Did you try the search without hardware accel and confirm that this behavior doesn't happen?

I confirm that without hardware accel it does not happen.

@raptr55 commented on GitHub (Mar 28, 2025): > Did you try the search without hardware accel and confirm that this behavior doesn't happen? I confirm that without hardware accel it does not happen.
Author
Owner

@mmomjian commented on GitHub (Mar 28, 2025):

I don’t see how this could be an immich issue. Most likely a hardware issue with the drive or filesystem.

@mmomjian commented on GitHub (Mar 28, 2025): I don’t see how this could be an immich issue. Most likely a hardware issue with the drive or filesystem.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/immich#5236
No description provided.