#7072 - [Bug] CUDA ML crash (CUDNN_STATUS_EXECUTION_FAILED) in v2.1.0 — fixed by downgrading to v2.0.0 - starred/immich

deekerman commented

2026-02-20 04:19:33 -05:00

Owner

Originally created by @TheOneWayTruth on GitHub (Oct 16, 2025).

I have searched the existing issues, both open and closed, to make sure this is not a duplicate report.

Yes

The bug

Summary
Immich 2.1.0 causes repeated CUDNN_STATUS_EXECUTION_FAILED errors in immich-machine-learning during ONNX inference on NVIDIA GPUs.
Downgrading the entire stack to v2.0.0 resolves the issue completely.

Workaround
Downgrading all containers to 2.0.0 eliminates the crash.

The OS that Immich Server is running on

Windows WSL

Version of Immich Server

2.1.0

Version of Immich Mobile App

Platform with the issue

Server
Web
Mobile

Device make and model

No response

Your docker-compose.yml content

immich-machine-learning:
    container_name: immich_machine_learning
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-cuda
    extends:
      file: hwaccel.ml.yml
      service: cuda          
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=compute,utility
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    volumes:
      - model-cache:/cache
    env_file:
      - .env
    restart: always
    healthcheck:
      disable: false
    networks:
      - immich-net

Your .env content

irrelivant

Reproduction steps

Update to 2.1.0 with NVIDEA Grafics Card
Update Faces
...

Relevant log output

Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedConv node. Name:'Conv_0' Status Message: CUDNN failure 5000: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=5e88cd429d96 ; file=/onnxruntime_src/onnxruntime/contrib_ops/cuda/ fused_conv.cc ; line=67 ; expr=cudnnConvolutionForward(cudnnHandle, &alpha, Base::s_.x_tensor, Base::s_.x_data, Base::s_.w_desc, Base::s_.w_data, Base::s_.conv_desc, Base::s_.algo, workspace.get(), Base::s_.workspace_bytes, &beta, Base::s_.y_tensor, Base::s_.y_data); [10/16/25 04:26:45] ERROR Exception in ASGI application

Additional information

No response

Originally created by @TheOneWayTruth on GitHub (Oct 16, 2025). ### I have searched the existing issues, both open and closed, to make sure this is not a duplicate report. - [x] Yes ### The bug Summary Immich 2.1.0 causes repeated CUDNN_STATUS_EXECUTION_FAILED errors in immich-machine-learning during ONNX inference on NVIDIA GPUs. Downgrading the entire stack to v2.0.0 resolves the issue completely. Workaround Downgrading all containers to 2.0.0 eliminates the crash. ### The OS that Immich Server is running on Windows WSL ### Version of Immich Server 2.1.0 ### Version of Immich Mobile App - ### Platform with the issue - [x] Server - [ ] Web - [ ] Mobile ### Device make and model _No response_ ### Your docker-compose.yml content ```YAML immich-machine-learning: container_name: immich_machine_learning image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-cuda extends: file: hwaccel.ml.yml service: cuda environment: - NVIDIA_VISIBLE_DEVICES=all - NVIDIA_DRIVER_CAPABILITIES=compute,utility deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] volumes: - model-cache:/cache env_file: - .env restart: always healthcheck: disable: false networks: - immich-net ``` ### Your .env content ```Shell irrelivant ``` ### Reproduction steps 1. Update to 2.1.0 with NVIDEA Grafics Card 2. Update Faces ... ### Relevant log output ```shell Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedConv node. Name:'Conv_0' Status Message: CUDNN failure 5000: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=5e88cd429d96 ; file=/onnxruntime_src/onnxruntime/contrib_ops/cuda/ fused_conv.cc ; line=67 ; expr=cudnnConvolutionForward(cudnnHandle, &alpha, Base::s_.x_tensor, Base::s_.x_data, Base::s_.w_desc, Base::s_.w_data, Base::s_.conv_desc, Base::s_.algo, workspace.get(), Base::s_.workspace_bytes, &beta, Base::s_.y_tensor, Base::s_.y_data); [10/16/25 04:26:45] ERROR Exception in ASGI application ``` ### Additional information _No response_

deekerman closed this issue

2026-02-20 04:19:33 -05:00

deekerman commented

2026-02-20 04:19:34 -05:00

Author

Owner

@jdicioccio commented on GitHub (Oct 16, 2025):

same general issue here. searches are now failing

immich_machine_learning  | 2025-10-16 10:26:10.323163167 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running ReduceSum node. Name:'/ReduceSum_1' Status Message: CUDNN failure 5000: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=d994606dd310 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/reduction/reduction_ops.cc ; line=774 ; expr=cudnnReduceTensor(GetCudnnHandle(ctx), reduce_desc, indices_cuda.get(), indices_bytes, workspace_cuda.get(), workspace_bytes, &one, input_tensor, temp_X.get(), &zero, output_tensor, temp_Y.get()); ```

@jdicioccio commented on GitHub (Oct 16, 2025): same general issue here. searches are now failing ```immich_machine_learning | 2025-10-16 10:26:10.323120331 [E:onnxruntime:Default, cuda_call.cc:118 CudaCall] CUDNN failure 5000: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=d994606dd310 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/reduction/reduction_ops.cc ; line=774 ; expr=cudnnReduceTensor(GetCudnnHandle(ctx), reduce_desc, indices_cuda.get(), indices_bytes, workspace_cuda.get(), workspace_bytes, &one, input_tensor, temp_X.get(), &zero, output_tensor, temp_Y.get()); immich_machine_learning | 2025-10-16 10:26:10.323163167 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running ReduceSum node. Name:'/ReduceSum_1' Status Message: CUDNN failure 5000: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=d994606dd310 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/reduction/reduction_ops.cc ; line=774 ; expr=cudnnReduceTensor(GetCudnnHandle(ctx), reduce_desc, indices_cuda.get(), indices_bytes, workspace_cuda.get(), workspace_bytes, &one, input_tensor, temp_X.get(), &zero, output_tensor, temp_Y.get()); ```

deekerman commented

2026-02-20 04:19:34 -05:00

Author

Owner

@cguertin14 commented on GitHub (Oct 16, 2025):

Same bug for me in v2.1.0, downgrading to v2.0.1 fixes it.

@cguertin14 commented on GitHub (Oct 16, 2025): Same bug for me in v2.1.0, downgrading to v2.0.1 fixes it.

deekerman commented

2026-02-20 04:19:34 -05:00

Author

Owner

@cloudcogsio commented on GitHub (Oct 16, 2025):

+1 on this issue.

v2.1.0 Machine Learning (CUDA) does not work with Quadro P400 BUT it works with RTX A2000

RTX A2000 Details (v2.1.0 Works) (WSL 2):
NVIDIA-SMI 553.24 Driver Version: 553.24 CUDA Version: 12.4

Quadro P400 Details (v2.1.0 Fails) (Linux VM):
NVIDIA-SMI 550.163.01 Driver Version: 550.163.01 CUDA Version: 12.4

 Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero
                             status code returned while running ReduceL2 node.
                             Name:'/ReduceL2' Status Message: CUDNN failure
                             5000: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ;
                             hostname=58da9360e0fd ;
                             file=/onnxruntime_src/onnxruntime/core/providers/cu
                             da/reduction/reduction_ops.cc ; line=571 ;
                             expr=cudnnReduceTensor(
                             CudaKernel::GetCudnnHandle(cuda_stream),
                             reduce_desc, indices_cuda.get(), indices_bytes,
                             workspace_cuda.get(), workspace_bytes, &one,
                             input_tensor, reinterpret_cast<const
                             CudaT*>(input.Data<T>()), &zero, output_tensor,
                             p_output);

Last call of traceback:

/opt/venv/lib/python3.11/site-packages/onnxrunt │
                             │ ime/capi/onnxruntime_inference_collection.py:22 │
                             │ 0 in run                                        │
                             │                                                 │
                             │    217 │   │   if not output_names:             │
                             │    218 │   │   │   output_names = [output.name  │
                             │    219 │   │   try:                             │
                             │ ❱  220 │   │   │   return self._sess.run(output │
                             │    221 │   │   except C.EPFail as err:          │
                             │    222 │   │   │   if self._enable_fallback:    │
                             │    223 │   │   │   │   print(f"EP Error: {err!s │

@cloudcogsio commented on GitHub (Oct 16, 2025): +1 on this issue. v2.1.0 Machine Learning (CUDA) **does not work with Quadro P400** BUT it works with **RTX A2000** RTX A2000 Details (v2.1.0 Works) (WSL 2): NVIDIA-SMI 553.24 Driver Version: 553.24 CUDA Version: 12.4 Quadro P400 Details (v2.1.0 Fails) (Linux VM): NVIDIA-SMI 550.163.01 Driver Version: 550.163.01 CUDA Version: 12.4 ``` Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running ReduceL2 node. Name:'/ReduceL2' Status Message: CUDNN failure 5000: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=58da9360e0fd ; file=/onnxruntime_src/onnxruntime/core/providers/cu da/reduction/reduction_ops.cc ; line=571 ; expr=cudnnReduceTensor( CudaKernel::GetCudnnHandle(cuda_stream), reduce_desc, indices_cuda.get(), indices_bytes, workspace_cuda.get(), workspace_bytes, &one, input_tensor, reinterpret_cast<const CudaT*>(input.Data<T>()), &zero, output_tensor, p_output); ``` Last call of traceback: ``` /opt/venv/lib/python3.11/site-packages/onnxrunt │ │ ime/capi/onnxruntime_inference_collection.py:22 │ │ 0 in run │ │ │ │ 217 │ │ if not output_names: │ │ 218 │ │ │ output_names = [output.name │ │ 219 │ │ try: │ │ ❱ 220 │ │ │ return self._sess.run(output │ │ 221 │ │ except C.EPFail as err: │ │ 222 │ │ │ if self._enable_fallback: │ │ 223 │ │ │ │ print(f"EP Error: {err!s │ ```

deekerman commented

2026-02-20 04:19:34 -05:00

Author

Owner

@mertalev commented on GitHub (Oct 16, 2025):

There were no CUDA-related changes in this release, but I suspect that #17718 triggered a rebuild of the CUDA image that might be installing different dependency versions now. Will look into fixing this.

@mertalev commented on GitHub (Oct 16, 2025): There were no CUDA-related changes in this release, but I suspect that #17718 triggered a rebuild of the CUDA image that might be installing different dependency versions now. Will look into fixing this.

deekerman commented

2026-02-20 04:19:34 -05:00

Author

Owner

@V-e-n-i-m commented on GitHub (Oct 16, 2025):

I have several ML related issues on this version as well with my Quadro P2000.

@V-e-n-i-m commented on GitHub (Oct 16, 2025): I have several ML related issues on this version as well with my Quadro P2000.

deekerman commented

2026-02-20 04:19:34 -05:00

Author

Owner

@Sacryn commented on GitHub (Oct 16, 2025):

Same issue here, GTX-960
NVIDIA-SMI 570.172.08 Driver Version: 570.172.08 CUDA Version: 12.8

I'm also getting issues with Conv_0

Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero     
                             status code returned while running FusedConv node. 
                             Name:'Conv_0' Status Message: CUDNN failure 5000:  
                             CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ;            
                             hostname=2887c47f5239 ;                            
                             file=/onnxruntime_src/onnxruntime/contrib_ops/cuda/
                             fused_conv.cc ; line=67 ;                          
                             expr=cudnnConvolutionForward(cudnnHandle, &alpha,  
                             Base::s_.x_tensor, Base::s_.x_data,                
                             Base::s_.w_desc, Base::s_.w_data,                  
                             Base::s_.conv_desc, Base::s_.algo, workspace.get(),
                             Base::s_.workspace_bytes, &beta, Base::s_.y_tensor,
                             Base::s_.y_data);

@Sacryn commented on GitHub (Oct 16, 2025): Same issue here, GTX-960 NVIDIA-SMI 570.172.08 Driver Version: 570.172.08 CUDA Version: 12.8 I'm also getting issues with `Conv_0` ``` Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedConv node. Name:'Conv_0' Status Message: CUDNN failure 5000: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=2887c47f5239 ; file=/onnxruntime_src/onnxruntime/contrib_ops/cuda/ fused_conv.cc ; line=67 ; expr=cudnnConvolutionForward(cudnnHandle, &alpha, Base::s_.x_tensor, Base::s_.x_data, Base::s_.w_desc, Base::s_.w_data, Base::s_.conv_desc, Base::s_.algo, workspace.get(), Base::s_.workspace_bytes, &beta, Base::s_.y_tensor, Base::s_.y_data); ```

deekerman commented

2026-02-20 04:19:34 -05:00

Author

Owner

@DrSpaldo commented on GitHub (Oct 17, 2025):

I just wanted to jump on and say that I have just started getting errors after updating, initially didn't find this thread and posted in this one

Interestingly, I am running two Immich servers, both on Unraid and both using the usual docker compose (not imagegenius app) and one seems to be working better than the other....

I have a 1070 running NVIDIA Driver Version: 580.82.09 & CUDA Version: 13.0 which gives the errors and fails.
The other machine is running on a newer 3060 and but older NVIDIA Driver Version: 570.169 & CUDA Version: 12.8 and isn't initially giving those errors. I have completed the re-index and it is able to use the smart search.

I have downgraded the 1070 machine to 2.0.1 and am re-indexing to see if that works, initially though, no errors.

So, without jumping the gun too much, I am thinking that the changes made by mertalev in https://github.com/immich-app/immich/pull/17718 are causing the issues. But not for all cards/drivers/CUDA versions, so I guess the changes in 2.1.0 need to be either reverted or narrowed down to the actual cause, then fixed...

2.1.0 works with

NVIDIA chipset 3060
NVIDIA older driver version 570.169
CUDA older version 12.8

2.1.0 does not work with (but reverting to 2.0.1 makes it work again)

NVIDIA chipset 1070
NVIDIA current driver version 580.82.09
CUDA current version 13.0

@DrSpaldo commented on GitHub (Oct 17, 2025): I just wanted to jump on and say that I have just started getting errors after updating, initially didn't find this thread and posted in this [one](https://github.com/immich-app/immich/discussions/20081) Interestingly, I am running two Immich servers, both on Unraid and both using the usual docker compose (not imagegenius app) and one seems to be working better than the other.... - I have a 1070 running NVIDIA Driver Version: 580.82.09 & CUDA Version: 13.0 which gives the errors and fails. - The other machine is running on a newer 3060 and but older NVIDIA Driver Version: 570.169 & CUDA Version: 12.8 and isn't initially giving those errors. I have completed the re-index and it is able to use the smart search. I have downgraded the 1070 machine to 2.0.1 and am re-indexing to see if that works, initially though, no errors. So, without jumping the gun too much, I am thinking that the changes made by [mertalev](https://github.com/mertalev) in https://github.com/immich-app/immich/pull/17718 are causing the issues. But not for all cards/drivers/CUDA versions, so I guess the changes in 2.1.0 need to be either reverted or narrowed down to the actual cause, then fixed... 2.1.0 works with - NVIDIA chipset 3060 - NVIDIA older driver version 570.169 - CUDA older version 12.8 2.1.0 does not work with (but reverting to 2.0.1 makes it work again) - NVIDIA chipset 1070 - NVIDIA current driver version 580.82.09 - CUDA current version 13.0

deekerman commented

2026-02-20 04:19:34 -05:00

Author

Owner

@Nullpo1nt commented on GitHub (Oct 17, 2025):

I'll add onto this, just upgraded to v2.1.0 today and encountered this issue. Reverting the ML service to v2.0.1 mitigates this for now.

NVIDIA GTX 970
Driver 570.195.03
CUDA 12.8

@Nullpo1nt commented on GitHub (Oct 17, 2025): I'll add onto this, just upgraded to v2.1.0 today and encountered this issue. Reverting the ML service to v2.0.1 mitigates this for now. - NVIDIA GTX 970 - Driver 570.195.03 - CUDA 12.8

deekerman commented

2026-02-20 04:19:34 -05:00

Author

Owner

@sigma-2 commented on GitHub (Oct 18, 2025):

I had the same problem. With v2.1.0 I encountered the same problem. Rolling back to v2.0.1 solved it.

NVIDIA 1050 Ti
Driver 550.142
Cuda 12.4
TrueNAS Scale 25.04.2.4

@sigma-2 commented on GitHub (Oct 18, 2025): I had the same problem. With v2.1.0 I encountered the same problem. Rolling back to v2.0.1 solved it. - NVIDIA 1050 Ti - Driver 550.142 - Cuda 12.4 - TrueNAS Scale 25.04.2.4

deekerman commented

2026-02-20 04:19:34 -05:00

Author

Owner

@Magnus987 commented on GitHub (Oct 19, 2025):

I can confirm this issue with my setup as well. Downgrading to v2.0.1 resolved it completely.

Hardware:

NVIDIA GeForce GTX 1060 6GB
Driver Version: 535.261.03
CUDA Version: 12.2

Error logs from v2.1.0:

mimalloc: warning: mi_usable_size: pointer might not point to a valid heap region: 0x78de40020000
(this may still be a valid very large allocation (over 64MiB))
mimalloc: warning: (yes, the previous pointer 0x78de40020000 was valid after all)

2025-10-19 15:54:04.792097956 [E:onnxruntime:Default, cuda_call.cc:118 CudaCall] CUDNN failure 5000: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=46cc5983962b ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/reduction/reduction_ops.cc ; line=571 ; expr=cudnnReduceTensor( CudaKernel::GetCudnnHandle(cuda_stream), reduce_desc, indices_cuda.get(), indices_bytes, workspace_cuda.get(), workspace_bytes, &one, input_tensor, reinterpret_cast<const CudaT*>(input.Data<T>()), &zero, output_tensor, p_output); 

2025-10-19 15:54:04.792149732 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running ReduceL2 node. Name:'/ReduceL2' Status Message: CUDNN failure 5000: CUDNN_STATUS_EXECUTION_FAILED

After downgrading to v2.0.1, everything works perfectly again.
Thanks to the Devs for having an eye on this and working on a fix!

@Magnus987 commented on GitHub (Oct 19, 2025): I can confirm this issue with my setup as well. Downgrading to v2.0.1 resolved it completely. **Hardware:** - NVIDIA GeForce GTX 1060 6GB - Driver Version: 535.261.03 - CUDA Version: 12.2 **Error logs from v2.1.0:** ``` mimalloc: warning: mi_usable_size: pointer might not point to a valid heap region: 0x78de40020000 (this may still be a valid very large allocation (over 64MiB)) mimalloc: warning: (yes, the previous pointer 0x78de40020000 was valid after all) 2025-10-19 15:54:04.792097956 [E:onnxruntime:Default, cuda_call.cc:118 CudaCall] CUDNN failure 5000: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=46cc5983962b ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/reduction/reduction_ops.cc ; line=571 ; expr=cudnnReduceTensor( CudaKernel::GetCudnnHandle(cuda_stream), reduce_desc, indices_cuda.get(), indices_bytes, workspace_cuda.get(), workspace_bytes, &one, input_tensor, reinterpret_cast<const CudaT*>(input.Data<T>()), &zero, output_tensor, p_output); 2025-10-19 15:54:04.792149732 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running ReduceL2 node. Name:'/ReduceL2' Status Message: CUDNN failure 5000: CUDNN_STATUS_EXECUTION_FAILED ``` After downgrading to v2.0.1, everything works perfectly again. Thanks to the Devs for having an eye on this and working on a fix!

deekerman commented

2026-02-20 04:19:34 -05:00

Author

Owner

@Nordtus commented on GitHub (Oct 19, 2025):

Same problem on V2.1.0

NVIDIA 1060 6GB
Driver 550.107.02 
Cuda 12.4
Proxmox LXC container (Debian GNU/Linux 12 (bookworm))

@Nordtus commented on GitHub (Oct 19, 2025): Same problem on V2.1.0 NVIDIA 1060 6GB Driver 550.107.02 Cuda 12.4 Proxmox LXC container (Debian GNU/Linux 12 (bookworm))

deekerman commented

2026-02-20 04:19:34 -05:00

Author

Owner

@ErichVonHampter commented on GitHub (Oct 19, 2025):

Installed TrueNAS Scale 25.04.2.4 on a DL20 Gen9 today, ML also doesnt work after following the guided. When i upload Images then i see that a python process is using the GPU (About 140mb vRAM used) and then gets 1% GPU-Util at max.

Downgrading from the current release to 2.0.1 seems nearly impossible without fully changing everything in TrueNAS Scale. Installing another container image looks nearly impossible. If anyone has a good guide on how to get back to v2.0.1 then i would be very thankfull.

GPU: Nvidia P1000
Driver: 550.142
CUDA: 12.4

@ErichVonHampter commented on GitHub (Oct 19, 2025): Installed TrueNAS Scale 25.04.2.4 on a DL20 Gen9 today, ML also doesnt work after following the guided. When i upload Images then i see that a python process is using the GPU (About 140mb vRAM used) and then gets 1% GPU-Util at max. Downgrading from the current release to 2.0.1 seems nearly impossible without fully changing everything in TrueNAS Scale. Installing another container image looks nearly impossible. If anyone has a good guide on how to get back to v2.0.1 then i would be very thankfull. GPU: Nvidia P1000 Driver: 550.142 CUDA: 12.4

deekerman commented

2026-02-20 04:19:34 -05:00

Author

Owner

@SecretAgentOne commented on GitHub (Oct 19, 2025):

I experienced the same issue on an NVIDIA Geforce GTX 1080 with v2.1.0. Reverting just the immich-machine-learning to v2.0.1-cuda made it work again.

EDIT: I'm running...

Ubuntu 24.04 LTS on kernel 6.8.05 on x64
nvidia-driver & nvidia-compute-utils 580.95.05
nvidia-container-toolkit 1.17.9-1
CUDA version 13

This is the error I saw in the logs:

Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero     
immich_machine_learning  |                              status code returned while running FusedConv node. 
immich_machine_learning  |                              Name:'Conv_0' Status Message: CUDNN failure 5000:  
immich_machine_learning  |                              CUDNN_STATUS_EXECUTION_FAILED

@SecretAgentOne commented on GitHub (Oct 19, 2025): I experienced the same issue on an NVIDIA Geforce GTX 1080 with v2.1.0. Reverting just the immich-machine-learning to v2.0.1-cuda made it work again. EDIT: I'm running... * Ubuntu 24.04 LTS on kernel 6.8.05 on x64 * nvidia-driver & nvidia-compute-utils 580.95.05 * nvidia-container-toolkit 1.17.9-1 * CUDA version 13 This is the error I saw in the logs: ``` Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero immich_machine_learning | status code returned while running FusedConv node. immich_machine_learning | Name:'Conv_0' Status Message: CUDNN failure 5000: immich_machine_learning | CUDNN_STATUS_EXECUTION_FAILED ```

deekerman commented

2026-02-20 04:19:34 -05:00

Author

Owner

@DrSpaldo commented on GitHub (Oct 19, 2025):

@alextran1502 do any of the dev team have NVidia cards that are say RTX 2xxx and 3xxx to see if that is what is causing the issue? Other than my 3060 that appears to be working, I don't see any other users that are having this issue (in this thread) that have newish cards.

I am wondering if the changes that @mertalev made in the 2.1.0 release have some type of unknown requirement that you are not aware of?

In any event, I think it would be reasonable that if you cannot find the fix that the changes made are rolled back, as it appears quite a few users are having issues after these changes were made. Remember, no more breaking changes ;)

@DrSpaldo commented on GitHub (Oct 19, 2025): @alextran1502 do any of the dev team have NVidia cards that are say RTX 2xxx and 3xxx to see if that is what is causing the issue? Other than my 3060 that appears to be working, I don't see any other users that are having this issue (in this thread) that have newish cards. I am wondering if the changes that @mertalev made in the 2.1.0 release have some type of unknown requirement that you are not aware of? In any event, I think it would be reasonable that if you cannot find the fix that the changes made are rolled back, as it appears quite a few users are having issues after these changes were made. Remember, no more breaking changes ;)

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@mertalev commented on GitHub (Oct 19, 2025):

I'll probably try to install the old CUDNN version for now to restore the old behavior, but I'm unsure as to how to handle these updates moving forward. It seems difficult to be confident an update won't cause issues like this given only some environments are affected.

@mertalev commented on GitHub (Oct 19, 2025): I'll probably try to install the old CUDNN version for now to restore the old behavior, but I'm unsure as to how to handle these updates moving forward. It seems difficult to be confident an update won't cause issues like this given only some environments are affected.

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@DrSpaldo commented on GitHub (Oct 19, 2025):

I'll probably try to install the old CUDNN version for now to restore the old behavior, but I'm unsure as to how to handle these updates moving forward. It seems difficult to be confident an update won't cause issues like this given only some environments are affected.

Pretty much what programming / testing is about?

@DrSpaldo commented on GitHub (Oct 19, 2025): > I'll probably try to install the old CUDNN version for now to restore the old behavior, but I'm unsure as to how to handle these updates moving forward. It seems difficult to be confident an update won't cause issues like this given only some environments are affected. Pretty much what programming / testing is about?

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@sigma-2 commented on GitHub (Oct 20, 2025):

Installed TrueNAS Scale 25.04.2.4 on a DL20 Gen9 today, ML also doesnt work after following the guided. When i upload Images then i see that a python process is using the GPU (About 140mb vRAM used) and then gets 1% GPU-Util at max.

Downgrading from the current release to 2.0.1 seems nearly impossible without fully changing everything in TrueNAS Scale. Installing another container image looks nearly impossible. If anyone has a good guide on how to get back to v2.0.1 then i would be very thankfull.

GPU: Nvidia P1000 Driver: 550.142 CUDA: 12.4

I did this to downgrade it on truenas scale:

TrueNAS SCALE Apps don’t let you downgrade directly through the UI, but you can pin a specific image version manually:

Stop the Immich app:
- Go to Apps → Installed Apps → Immich → Stop.
Edit the app configuration:
- Click the three dots → Edit.
- Scroll to the Container Images or Image Repository/Tag fields (it depends on the catalog).
- Replace the current image tag (likely latest or release) with:

ghcr.io/immich-app/immich-server:2.0.1
ghcr.io/immich-app/immich-machine-learning:2.0.1

- Do this for **both** containers (`server` and `machine-learning`), and if you see `immich-microservices`, apply it too.

Save and redeploy the app.
Wait for it to pull and start — you can confirm the version with:
```
docker exec -it ix-immich-server immich --version
```
(adjust the container name if needed)

Hope this helps you!

@sigma-2 commented on GitHub (Oct 20, 2025): > Installed TrueNAS Scale 25.04.2.4 on a DL20 Gen9 today, ML also doesnt work after following the guided. When i upload Images then i see that a python process is using the GPU (About 140mb vRAM used) and then gets 1% GPU-Util at max. > > Downgrading from the current release to 2.0.1 seems nearly impossible without fully changing everything in TrueNAS Scale. Installing another container image looks nearly impossible. If anyone has a good guide on how to get back to v2.0.1 then i would be very thankfull. > > GPU: Nvidia P1000 Driver: 550.142 CUDA: 12.4 I did this to downgrade it on truenas scale: TrueNAS SCALE Apps don’t let you downgrade directly through the UI, but you can **pin a specific image version manually**: 1. **Stop** the Immich app: - Go to **Apps → Installed Apps → Immich → Stop**. 2. **Edit the app configuration:** - Click the **three dots → Edit**. - Scroll to the **Container Images** or **Image Repository/Tag** fields (it depends on the catalog). - Replace the current image tag (likely `latest` or `release`) with: ``` ghcr.io/immich-app/immich-server:2.0.1 ghcr.io/immich-app/immich-machine-learning:2.0.1 ``` - Do this for **both** containers (`server` and `machine-learning`), and if you see `immich-microservices`, apply it too. 3. **Save and redeploy** the app. 4. Wait for it to pull and start — you can confirm the version with: ```bash docker exec -it ix-immich-server immich --version ``` (adjust the container name if needed) Hope this helps you!

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@ErichVonHampter commented on GitHub (Oct 20, 2025):

Installed TrueNAS Scale 25.04.2.4 on a DL20 Gen9 today, ML also doesnt work after following the guided. When i upload Images then i see that a python process is using the GPU (About 140mb vRAM used) and then gets 1% GPU-Util at max.
Downgrading from the current release to 2.0.1 seems nearly impossible without fully changing everything in TrueNAS Scale. Installing another container image looks nearly impossible. If anyone has a good guide on how to get back to v2.0.1 then i would be very thankfull.
GPU: Nvidia P1000 Driver: 550.142 CUDA: 12.4

I did this to downgrade it on truenas scale:

TrueNAS SCALE Apps don’t let you downgrade directly through the UI, but you can pin a specific image version manually:
1. **Stop** the Immich app:
   
   * Go to **Apps → Installed Apps → Immich → Stop**.

2. **Edit the app configuration:**
   
   * Click the **three dots → Edit**.
   * Scroll to the **Container Images** or **Image Repository/Tag** fields (it depends on the catalog).
   * Replace the current image tag (likely `latest` or `release`) with:
ghcr.io/immich-app/immich-server:2.0.1
ghcr.io/immich-app/immich-machine-learning:2.0.1
- Do this for **both** containers (`server` and `machine-learning`), and if you see `immich-microservices`, apply it too.     
3. **Save and redeploy** the app.

4. Wait for it to pull and start — you can confirm the version with:
   docker exec -it ix-immich-server immich --version
       
         
       
   
         
       
   
       
     
   (adjust the container name if needed)
Hope this helps you!

Sadly no, it also seems like 1:1 the same thing that ChatGPT told me to do. I dont have "three dots → Edit." but "Edit and Three Dots next to it. Also there is no Chance to click on something that is somewhat near "App Configuration".

Thinking that "yeah i just install something Linux-Based and it will just work" wasnt the right way to think i guess. Maybe the best way for me would be to wait until a update comes along.

Edit 20.10.2025 #1: Deleting the "ghcr.io/immich-app/immich-machine-learning:v2.1.0-cuda" Image and using "ghcr.io/immich-app/immich-machine-learning:v2.0.1-cuda" without any Tag at least downloaded the right Image as it seems like. Going to check it further.

Edit 20.10.2025 #2: After deleting everything from the Image Container that had "2.1.0" in its name AND then pulling the images with 2.01 again the 2.1.0 do reapper after starting the app.

Edit 20.10.2025 #3: Changed Immich to "Custom App", changed the Image File in the then editable Textfile and now it works. But now Immich is a Custom App and i would guess that this will also give me other problems.

@ErichVonHampter commented on GitHub (Oct 20, 2025): > > Installed TrueNAS Scale 25.04.2.4 on a DL20 Gen9 today, ML also doesnt work after following the guided. When i upload Images then i see that a python process is using the GPU (About 140mb vRAM used) and then gets 1% GPU-Util at max. > > Downgrading from the current release to 2.0.1 seems nearly impossible without fully changing everything in TrueNAS Scale. Installing another container image looks nearly impossible. If anyone has a good guide on how to get back to v2.0.1 then i would be very thankfull. > > GPU: Nvidia P1000 Driver: 550.142 CUDA: 12.4 > > I did this to downgrade it on truenas scale: > > TrueNAS SCALE Apps don’t let you downgrade directly through the UI, but you can **pin a specific image version manually**: > > 1. **Stop** the Immich app: > > * Go to **Apps → Installed Apps → Immich → Stop**. > > 2. **Edit the app configuration:** > > * Click the **three dots → Edit**. > * Scroll to the **Container Images** or **Image Repository/Tag** fields (it depends on the catalog). > * Replace the current image tag (likely `latest` or `release`) with: > > > ``` > ghcr.io/immich-app/immich-server:2.0.1 > ghcr.io/immich-app/immich-machine-learning:2.0.1 > ``` > > ``` > - Do this for **both** containers (`server` and `machine-learning`), and if you see `immich-microservices`, apply it too. > ``` > > 3. **Save and redeploy** the app. > > 4. Wait for it to pull and start — you can confirm the version with: > docker exec -it ix-immich-server immich --version > > > > > > > > > > (adjust the container name if needed) > > > Hope this helps you! Sadly no, it also seems like 1:1 the same thing that ChatGPT told me to do. I dont have "**three dots → Edit**." but "Edit and Three Dots next to it. Also there is no Chance to click on something that is somewhat near "App Configuration". Thinking that "yeah i just install something Linux-Based and it will just work" wasnt the right way to think i guess. Maybe the best way for me would be to wait until a update comes along. Edit 20.10.2025 #1: Deleting the "ghcr.io/immich-app/immich-machine-learning:v2.1.0-cuda" Image and using "ghcr.io/immich-app/immich-machine-learning:v2.0.1-cuda" without any Tag at least downloaded the right Image as it seems like. Going to check it further. Edit 20.10.2025 #2: After deleting everything from the Image Container that had "2.1.0" in its name AND then pulling the images with 2.01 again the 2.1.0 do reapper after starting the app. Edit 20.10.2025 #3: Changed Immich to "Custom App", changed the Image File in the then editable Textfile and now it works. But now Immich is a Custom App and i would guess that this will also give me other problems.

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@sigma-2 commented on GitHub (Oct 20, 2025):

Installed TrueNAS Scale 25.04.2.4 on a DL20 Gen9 today, ML also doesnt work after following the guided. When i upload Images then i see that a python process is using the GPU (About 140mb vRAM used) and then gets 1% GPU-Util at max.
Downgrading from the current release to 2.0.1 seems nearly impossible without fully changing everything in TrueNAS Scale. Installing another container image looks nearly impossible. If anyone has a good guide on how to get back to v2.0.1 then i would be very thankfull.
GPU: Nvidia P1000 Driver: 550.142 CUDA: 12.4

I did this to downgrade it on truenas scale:
TrueNAS SCALE Apps don’t let you downgrade directly through the UI, but you can pin a specific image version manually:
1. **Stop** the Immich app:
   
   * Go to **Apps → Installed Apps → Immich → Stop**.

2. **Edit the app configuration:**
   
   * Click the **three dots → Edit**.
   * Scroll to the **Container Images** or **Image Repository/Tag** fields (it depends on the catalog).
   * Replace the current image tag (likely `latest` or `release`) with:
ghcr.io/immich-app/immich-server:2.0.1
ghcr.io/immich-app/immich-machine-learning:2.0.1
- Do this for **both** containers (`server` and `machine-learning`), and if you see `immich-microservices`, apply it too.     
3. **Save and redeploy** the app.

4. Wait for it to pull and start — you can confirm the version with:
   docker exec -it ix-immich-server immich --version
       
         
       
   
         
       
   
       
     
   (adjust the container name if needed)
Hope this helps you!
Sadly no, it also seems like 1:1 the same thing that ChatGPT told me to do. I dont have "three dots → Edit." but "Edit and Three Dots next to it. Also there is no Chance to click on something that is somewhat near "App Configuration".

Thinking that "yeah i just install something Linux-Based and it will just work" wasnt the right way to think i guess. Maybe the best way for me would be to wait until a update comes along.

Edit 20.10.2025 #1: Deleting the "ghcr.io/immich-app/immich-machine-learning:v2.1.0-cuda" Image and using "ghcr.io/immich-app/immich-machine-learning:v2.0.1-cuda" without any Tag at least downloaded the right Image as it seems like. Going to check it further.

Edit 20.10.2025 #2: After deleting everything from the Image Container that had "2.1.0" in its name AND then pulling the images with 2.01 again the 2.1.0 do reapper after starting the app.

Edit 20.10.2025 #3: Changed Immich to "Custom App", changed the Image File in the then editable Textfile and now it works. But now Immich is a Custom App and i would guess that this will also give me other problems.

Right, I forgot about the custom app part, sorry. I changed it too to a custom app in the text file as per the instructions.
My guess is that in the future, updates will have to be done manually through this text file.

@sigma-2 commented on GitHub (Oct 20, 2025): > > > Installed TrueNAS Scale 25.04.2.4 on a DL20 Gen9 today, ML also doesnt work after following the guided. When i upload Images then i see that a python process is using the GPU (About 140mb vRAM used) and then gets 1% GPU-Util at max. > > > Downgrading from the current release to 2.0.1 seems nearly impossible without fully changing everything in TrueNAS Scale. Installing another container image looks nearly impossible. If anyone has a good guide on how to get back to v2.0.1 then i would be very thankfull. > > > GPU: Nvidia P1000 Driver: 550.142 CUDA: 12.4 > > > > > > I did this to downgrade it on truenas scale: > > TrueNAS SCALE Apps don’t let you downgrade directly through the UI, but you can **pin a specific image version manually**: > > ``` > > 1. **Stop** the Immich app: > > > > * Go to **Apps → Installed Apps → Immich → Stop**. > > > > 2. **Edit the app configuration:** > > > > * Click the **three dots → Edit**. > > * Scroll to the **Container Images** or **Image Repository/Tag** fields (it depends on the catalog). > > * Replace the current image tag (likely `latest` or `release`) with: > > ``` > > > > > > > > > > > > > > > > > > > > > > > > ``` > > ghcr.io/immich-app/immich-server:2.0.1 > > ghcr.io/immich-app/immich-machine-learning:2.0.1 > > ``` > > > > > > > > > > > > > > > > > > > > > > > > ``` > > - Do this for **both** containers (`server` and `machine-learning`), and if you see `immich-microservices`, apply it too. > > ``` > > > > > > > > > > > > > > > > > > > > > > > > ``` > > 3. **Save and redeploy** the app. > > > > 4. Wait for it to pull and start — you can confirm the version with: > > docker exec -it ix-immich-server immich --version > > > > > > > > > > > > > > > > > > > > (adjust the container name if needed) > > ``` > > > > > > > > > > > > > > > > > > > > > > > > Hope this helps you! > > Sadly no, it also seems like 1:1 the same thing that ChatGPT told me to do. I dont have "**three dots → Edit**." but "Edit and Three Dots next to it. Also there is no Chance to click on something that is somewhat near "App Configuration". > > Thinking that "yeah i just install something Linux-Based and it will just work" wasnt the right way to think i guess. Maybe the best way for me would be to wait until a update comes along. > > Edit 20.10.2025 [#1](https://github.com/immich-app/immich/issues/1): Deleting the "ghcr.io/immich-app/immich-machine-learning:v2.1.0-cuda" Image and using "ghcr.io/immich-app/immich-machine-learning:v2.0.1-cuda" without any Tag at least downloaded the right Image as it seems like. Going to check it further. > > Edit 20.10.2025 [#2](https://github.com/immich-app/immich/pull/2): After deleting everything from the Image Container that had "2.1.0" in its name AND then pulling the images with 2.01 again the 2.1.0 do reapper after starting the app. > > Edit 20.10.2025 [#3](https://github.com/immich-app/immich/issues/3): Changed Immich to "Custom App", changed the Image File in the then editable Textfile and now it works. But now Immich is a Custom App and i would guess that this will also give me other problems. Right, I forgot about the custom app part, sorry. I changed it too to a custom app in the text file as per the instructions. My guess is that in the future, updates will have to be done manually through this text file.

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@ErichVonHampter commented on GitHub (Oct 20, 2025):

@sigma-2 Thank u for u help. It now works as intendet! :)

@ErichVonHampter commented on GitHub (Oct 20, 2025): @sigma-2 Thank u for u help. It now works as intendet! :)

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@tr1plus commented on GitHub (Oct 20, 2025):

Just want to jump in to increase visibility and mention I have the same issue.

@tr1plus commented on GitHub (Oct 20, 2025): Just want to jump in to increase visibility and mention I have the same issue.

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@j8ith commented on GitHub (Oct 20, 2025):

I also have the same issue with 2.1.0, rolling back to 2.0.1 confirmed working.

@j8ith commented on GitHub (Oct 20, 2025): I also have the same issue with 2.1.0, rolling back to 2.0.1 confirmed working.

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@mertalev commented on GitHub (Oct 20, 2025):

Immich 2.0.1 installed cuDNN 9.8, while 2.1.0 installs 9.14. It seems Pascal and Maxwell cards are no longer supported as of cuDNN 9.11 (see 9.10 vs 9.11), which explains why everyone with the issue has a Pascal or Maxwell GPU. The solution will be to pin to 9.10, and probably add a new CUDA 13 image variant so newer GPUs can continue to receive updates.

@mertalev commented on GitHub (Oct 20, 2025): Immich 2.0.1 installed cuDNN 9.8, while 2.1.0 installs 9.14. It seems Pascal and Maxwell cards are no longer supported as of cuDNN 9.11 (see [9.10](https://docs.nvidia.com/deeplearning/cudnn/backend/v9.10.0/reference/support-matrix.html) vs [9.11](https://docs.nvidia.com/deeplearning/cudnn/backend/v9.11.0/reference/support-matrix.html)), which explains why everyone with the issue has a Pascal or Maxwell GPU. The solution will be to pin to 9.10, and probably add a new CUDA 13 image variant so newer GPUs can continue to receive updates.

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@mertalev commented on GitHub (Oct 20, 2025):

Would anyone care to try ghcr.io/immich-app/immich-machine-learning:pr-23110-cuda?

@mertalev commented on GitHub (Oct 20, 2025): Would anyone care to try `ghcr.io/immich-app/immich-machine-learning:pr-23110-cuda`?

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@DrSpaldo commented on GitHub (Oct 20, 2025):

Would anyone care to try ghcr.io/immich-app/immich-machine-learning:pr-23110-cuda?

Great thinking @mertalev , glad you were able to look into it and come up with a plan. I’ll give that build a go in a few hours. Should I test on the 1070 and 3060 or just 1070

@DrSpaldo commented on GitHub (Oct 20, 2025): > Would anyone care to try `ghcr.io/immich-app/immich-machine-learning:pr-23110-cuda`? Great thinking @mertalev , glad you were able to look into it and come up with a plan. I’ll give that build a go in a few hours. Should I test on the 1070 and 3060 or just 1070

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@mertalev commented on GitHub (Oct 20, 2025):

The 1070 is the important one to test since I can confirm it works on my 4090, but if you can test both then even better.

@mertalev commented on GitHub (Oct 20, 2025): The 1070 is the important one to test since I can confirm it works on my 4090, but if you can test both then even better.

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@dasunsrule32 commented on GitHub (Oct 20, 2025):

Would anyone care to try ghcr.io/immich-app/immich-machine-learning:pr-23110-cuda?

It worked perfectly on my 1080ti. Android App timeline was busted on that build though. haha

@dasunsrule32 commented on GitHub (Oct 20, 2025): > Would anyone care to try `ghcr.io/immich-app/immich-machine-learning:pr-23110-cuda`? It worked perfectly on my 1080ti. Android App timeline was busted on that build though. haha

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@thardie commented on GitHub (Oct 20, 2025):

Confirmed ghcr.io/immich-app/immich-machine-learning:pr-23110-cuda works on my Tesla P40. Thanks!

@thardie commented on GitHub (Oct 20, 2025): Confirmed `ghcr.io/immich-app/immich-machine-learning:pr-23110-cuda` works on my Tesla P40. Thanks!

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@j8ith commented on GitHub (Oct 20, 2025):

Confirmed ghcr.io/immich-app/immich-machine-learning:pr-23110-cuda works on my 1060. Thanks!

@j8ith commented on GitHub (Oct 20, 2025): Confirmed ghcr.io/immich-app/immich-machine-learning:pr-23110-cuda works on my 1060. Thanks!

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@Sacryn commented on GitHub (Oct 21, 2025):

Immich 2.0.1 installed cuDNN 9.8, while 2.1.0 installs 9.14. It seems Pascal and Maxwell cards are no longer supported as of cuDNN 9.11 (see 9.10 vs 9.11), which explains why everyone with the issue has a Pascal or Maxwell GPU. The solution will be to pin to 9.10, and probably add a new CUDA 13 image variant so newer GPUs can continue to receive updates.

According to the release notes of 9.11, everything older than Turing (GTX 16xx & RTX 20xx cards) has been dropped.
The pr-image worked like a charm for my old GTX 960.

@Sacryn commented on GitHub (Oct 21, 2025): > Immich 2.0.1 installed cuDNN 9.8, while 2.1.0 installs 9.14. It seems Pascal and Maxwell cards are no longer supported as of cuDNN 9.11 (see [9.10](https://docs.nvidia.com/deeplearning/cudnn/backend/v9.10.0/reference/support-matrix.html) vs [9.11](https://docs.nvidia.com/deeplearning/cudnn/backend/v9.11.0/reference/support-matrix.html)), which explains why everyone with the issue has a Pascal or Maxwell GPU. The solution will be to pin to 9.10, and probably add a new CUDA 13 image variant so newer GPUs can continue to receive updates. According to the release notes of 9.11, everything older than Turing (GTX 16xx & RTX 20xx cards) has been dropped. The pr-image worked like a charm for my old GTX 960.

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@KristianKarl commented on GitHub (Oct 21, 2025):

Confirmed ghcr.io/immich-app/immich-machine-learning:pr-23110-cuda works on my GTX 960 Thanks!

@KristianKarl commented on GitHub (Oct 21, 2025): Confirmed ghcr.io/immich-app/immich-machine-learning:pr-23110-cuda works on my GTX 960 Thanks!

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@tr1plus commented on GitHub (Oct 21, 2025):

What would be the "official" way to have this supported long term now?
Will a seperate machine-learning container/image be provided?

I am using the official docker compose and use ansible to modify certain locations for my use case (e.g. enable Machine learning, ...) - so I would like to know the expected approach to modify my ansible scripts.

@tr1plus commented on GitHub (Oct 21, 2025): What would be the "official" way to have this supported long term now? Will a seperate machine-learning container/image be provided? I am using the official docker compose and use ansible to modify certain locations for my use case (e.g. enable Machine learning, ...) - so I would like to know the expected approach to modify my ansible scripts.

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@Jonathan-Ddn commented on GitHub (Oct 21, 2025):

Had the exact same issue, can confirm @mertalev solution works for Quadro P400.

@Jonathan-Ddn commented on GitHub (Oct 21, 2025): Had the exact same issue, can confirm @mertalev solution works for Quadro P400.

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@mertalev commented on GitHub (Oct 21, 2025):

What would be the "official" way to have this supported long term now?
Will a seperate machine-learning container/image be provided?

I am using the official docker compose and use ansible to modify certain locations for my use case (e.g. enable Machine learning, ...) - so I would like to know the expected approach to modify my ansible scripts.

The plan is to add a -cuda-12 tag for the current CUDA image and (later) -cuda-13 for newer cards. -cuda will start pointing to the latest CUDA version we support as of 3.0 (as a breaking change), which will likely be CUDA 13. That means using -cuda-12 will be the best way to avoid disruption as long as we support it.

@mertalev commented on GitHub (Oct 21, 2025): > What would be the "official" way to have this supported long term now? > Will a seperate machine-learning container/image be provided? > > I am using the official docker compose and use ansible to modify certain locations for my use case (e.g. enable Machine learning, ...) - so I would like to know the expected approach to modify my ansible scripts. The plan is to add a -cuda-12 tag for the current CUDA image and (later) -cuda-13 for newer cards. -cuda will start pointing to the latest CUDA version we support as of 3.0 (as a breaking change), which will likely be CUDA 13. That means using -cuda-12 will be the best way to avoid disruption as long as we support it.

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@yeeahnick commented on GitHub (Oct 21, 2025):

For TrueNAS SCALE users who want the test tag fix, here’s what worked for me:

Stop the Immich app.

Edit the ix_values.yaml file located at /mnt/.ix-apps/app_configs/immich/versions/1.10.7/ and change the image tag from v2.1.0-cuda to pr-23110-cuda.

Go to Apps → Configuration → Manage Container Images and manually pull the new image.

Start the Immich app.

Edit the Immich app and simply click Save to trigger a redeploy.

@yeeahnick commented on GitHub (Oct 21, 2025): For TrueNAS SCALE users who want the test tag fix, here’s what worked for me: Stop the Immich app. Edit the ix_values.yaml file located at /mnt/.ix-apps/app_configs/immich/versions/1.10.7/ and change the image tag from v2.1.0-cuda to pr-23110-cuda. Go to Apps → Configuration → Manage Container Images and manually pull the new image. Start the Immich app. Edit the Immich app and simply click Save to trigger a redeploy.

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@MWP commented on GitHub (Oct 22, 2025):

Same problem here with a GTX1060.
The "pr-23110-cuda" image works.

@MWP commented on GitHub (Oct 22, 2025): Same problem here with a GTX1060. The "pr-23110-cuda" image works.

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@wajer1 commented on GitHub (Oct 29, 2025):

Same problem here with a tesla m4
The "pr-23110-cuda" image works.

@wajer1 commented on GitHub (Oct 29, 2025): Same problem here with a tesla m4 The "pr-23110-cuda" image works.

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@DrSpaldo commented on GitHub (Oct 30, 2025):

Do we still need to include the pr-23110-cuda tag on the ML container? Ie. Is this now included in 2.2.0?

Edit; never mind, I was searching the changelog for CUDA not cudnn - found this

fix(ml): pin cudnn version by @mertalev in https://github.com/immich-app/immich/pull/23110

@DrSpaldo commented on GitHub (Oct 30, 2025): Do we still need to include the pr-23110-cuda tag on the ML container? Ie. Is this now included in 2.2.0? _Edit; never mind, I was searching the changelog for CUDA not cudnn - found this_ > fix(ml): pin cudnn version by @mertalev in https://github.com/immich-app/immich/pull/23110

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@zvarnes commented on GitHub (Oct 30, 2025):

Can someone tell me where I supposed to put this tag?

Also the fix was merged? I'm struggling to understand the core issue here, and how it's still a broken change in the latest release.

@zvarnes commented on GitHub (Oct 30, 2025): Can someone tell me where I supposed to put this tag? Also the fix was merged? I'm struggling to understand the core issue here, and how it's still a broken change in the latest release.

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@benjoon90 commented on GitHub (Oct 30, 2025):

Can someone tell me where I supposed to put this tag?

Also the fix was merged? I'm struggling to understand the core issue here, and how it's still a broken change in the latest release.

It's supposed to go onto your compose file under the immich_machine_learning container.

immich-machine-learning:
container_name: immich_machine_learning
# For hardware acceleration, add one of -[armnn, cuda, rocm, openvino, rknn] to the image tag.
# Example tag: ${IMMICH_VERSION:-release}-cuda
image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-cuda

It's fixed with 2.2.0 for NVIDIA GTX1070. Remember to pull the latest image after editing the compose file.

@benjoon90 commented on GitHub (Oct 30, 2025): > Can someone tell me where I supposed to put this tag? > > Also the fix was merged? I'm struggling to understand the core issue here, and how it's still a broken change in the latest release. It's supposed to go onto your compose file under the immich_machine_learning container. immich-machine-learning: container_name: immich_machine_learning # For hardware acceleration, add one of -[armnn, cuda, rocm, openvino, rknn] to the image tag. # Example tag: ${IMMICH_VERSION:-release}-cuda image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-cuda It's fixed with 2.2.0 for NVIDIA GTX1070. Remember to pull the latest image after editing the compose file.

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@zvarnes commented on GitHub (Oct 30, 2025):

Okay... I guess the issue is resolved in the new version. I was seeing errors again, but it seems to be done to something else. Thanks for the quick reply

@zvarnes commented on GitHub (Oct 30, 2025): Okay... I guess the issue is resolved in the new version. I was seeing errors again, but it seems to be done to something else. Thanks for the quick reply

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@yeeahnick commented on GitHub (Nov 3, 2025):

Issue is resolved but does that mean newer cards won't work? I am upgrading my Nvidia GPU (Pascal to Turing) since Truenas stopped supporting older cards and am wondering if it will work with Immich. Was the change reverted or is there some sort of intelligence to detect which driver to use?

Thanks

@yeeahnick commented on GitHub (Nov 3, 2025): Issue is resolved but does that mean newer cards won't work? I am upgrading my Nvidia GPU (Pascal to Turing) since Truenas stopped supporting older cards and am wondering if it will work with Immich. Was the change reverted or is there some sort of intelligence to detect which driver to use? Thanks

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@DrSpaldo commented on GitHub (Nov 3, 2025):

@yeeahnick , yep, they are keeping the older version of CUDA to support the older cards. There will be a new variable at some stage for those people with newer cards

@DrSpaldo commented on GitHub (Nov 3, 2025): @yeeahnick , yep, they are keeping the older version of CUDA to support the older cards. There will be a new variable at some stage for those people with newer cards

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@yeeahnick commented on GitHub (Nov 3, 2025):

@yeeahnick , yep, they are keeping the older version of CUDA to support the older cards. There will be a new variable at some stage for those people with newer cards

Until then what happens to TrueNAS users? The latest version (Goldeneye) stopped supporting the old driver in favor of the newer one. The Immich community app in TrueNAS should be updated asap unless I'm missing something.

@yeeahnick commented on GitHub (Nov 3, 2025): > [@yeeahnick](https://github.com/yeeahnick) , yep, they are keeping the older version of CUDA to support the older cards. There will be a new variable at some stage for those people with newer cards Until then what happens to TrueNAS users? The latest version (Goldeneye) stopped supporting the old driver in favor of the newer one. The Immich community app in TrueNAS should be updated asap unless I'm missing something.

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@DrSpaldo commented on GitHub (Nov 3, 2025):

Until then what happens to TrueNAS users? The latest version (Goldeneye) stopped supporting the old driver in favor of the newer one. The Immich community app in TrueNAS should be updated asap unless I'm missing something.

@yeeahnick, I don't use TrueNAS, so can't really comment too much on it. But, most apps just pull the container from the official ones, so going to 2.2.0 or newer should revert the cudnn version to the more compatible version. Have you tried updating the version? While it wasn't working, did you revert the changes you previously made in ix_values.yaml ? I would think it just needs to be back to https://github.com/truenas/apps/blob/master/ix-dev/community/immich/ix_values.yaml

@DrSpaldo commented on GitHub (Nov 3, 2025): > Until then what happens to TrueNAS users? The latest version (Goldeneye) stopped supporting the old driver in favor of the newer one. The Immich community app in TrueNAS should be updated asap unless I'm missing something. @yeeahnick, I don't use TrueNAS, so can't really comment too much on it. But, most apps just pull the container from the official ones, so going to 2.2.0 or newer should revert the cudnn version to the more compatible version. Have you tried updating the version? While it wasn't working, did you revert the changes you previously made in ix_values.yaml ? I would think it just needs to be back to https://github.com/truenas/apps/blob/master/ix-dev/community/immich/ix_values.yaml

deekerman commented

2026-02-20 04:19:35 -05:00

Author

Owner

@yeeahnick commented on GitHub (Nov 5, 2025):

Until then what happens to TrueNAS users? The latest version (Goldeneye) stopped supporting the old driver in favor of the newer one. The Immich community app in TrueNAS should be updated asap unless I'm missing something.

@yeeahnick, I don't use TrueNAS, so can't really comment too much on it. But, most apps just pull the container from the official ones, so going to 2.2.0 or newer should revert the cudnn version to the more compatible version. Have you tried updating the version? While it wasn't working, did you revert the changes you previously made in ix_values.yaml ? I would think it just needs to be back to https://github.com/truenas/apps/blob/master/ix-dev/community/immich/ix_values.yaml

All good, I installed the new Turing GPU (Quadro RTX 4000) and it's working with Immich 2.2.2 on TrueNAS.

@yeeahnick commented on GitHub (Nov 5, 2025): > > Until then what happens to TrueNAS users? The latest version (Goldeneye) stopped supporting the old driver in favor of the newer one. The Immich community app in TrueNAS should be updated asap unless I'm missing something. > > [@yeeahnick](https://github.com/yeeahnick), I don't use TrueNAS, so can't really comment too much on it. But, most apps just pull the container from the official ones, so going to 2.2.0 or newer should revert the cudnn version to the more compatible version. Have you tried updating the version? While it wasn't working, did you revert the changes you previously made in ix_values.yaml ? I would think it just needs to be back to https://github.com/truenas/apps/blob/master/ix-dev/community/immich/ix_values.yaml All good, I installed the new Turing GPU (Quadro RTX 4000) and it's working with Immich 2.2.2 on TrueNAS.

Rows
Columns

[Bug] CUDA ML crash (CUDNN_STATUS_EXECUTION_FAILED) in v2.1.0 — fixed by downgrading to v2.0.0 #7072

I have searched the existing issues, both open and closed, to make sure this is not a duplicate report.

The bug

The OS that Immich Server is running on

Version of Immich Server

Version of Immich Mobile App

Platform with the issue

Device make and model

Your docker-compose.yml content

Your .env content

Reproduction steps

Relevant log output

Additional information