VirtualBox: TensorFlow with AVX2 support does not work due to missing FMA instructions #2186

Closed
opened 2026-02-20 01:07:38 -05:00 by deekerman · 5 comments
Owner

Originally created by @keif888 on GitHub (Sep 14, 2024).

1. What is not working as documented?

Development build environment does not work in VirtualBox environment.
This is because the tensorflow library fails.

localuser@debian-pp-dev:~/repos/photoprism$ make terminal
docker compose exec -u 1000 photoprism bash
photoprism@240827-noble:/go/src/github.com/photoprism/photoprism$ ./photoprism start
2024-09-14 12:38:57.363445: F tensorflow/core/platform/cpu_feature_guard.cc:37] The TensorFlow library was compiled to use FMA instructions, but these aren't available on your machine.
Aborted (core dumped)

2. How can we reproduce it?

Install debian 12.7 (or other supported linux) on VirtualBox on a host that supports the avx or avx2 instruction set.
Install docker
Install photoprism development as per the https://docs.photoprism.app/developer-guide/setup/ instructions

3. What behavior do you expect?

photoprism to start without errors.

4. What could be the cause of your problem?

/scripts/dist/install-tensorflow.sh is choosing the avx2 version of the tensorflow library, which also uses the fma instruction set.
It needs to choose the basic version of the tensorflow library instead if the fma instruction set is not available.

5. Can you provide us with example files for testing, error logs, or screenshots?

This is from the initial docker compose up output.

photoprism-1  | init: tensorflow
photoprism-1  | /scripts/install-tensorflow.sh auto
photoprism-1  | Detecting driver...
photoprism-1  | Installing TensorFlow 1.15.2 for AMD64-AVX2 in "/usr"...
photoprism-1  | Downloading amd64 libs from "https://dl.photoprism.app/tensorflow/amd64/libtensorflow-amd64-avx2-1.15.2.tar.gz". Please wait.
photoprism-1  | Extracting "/tmp/amd64/libtensorflow-amd64-avx2-1.15.2.tar.gz" to "/usr".

This is from the attempt to start photoprism.

keith@debian-pp-dev:~/repos/photoprism$ make terminal
docker compose exec -u 1000 photoprism bash
photoprism@240827-noble:/go/src/github.com/photoprism/photoprism$ ./photoprism start
2024-09-14 12:38:57.363445: F tensorflow/core/platform/cpu_feature_guard.cc:37] The TensorFlow library was compiled to use FMA instructions, but these aren't available on your machine.
Aborted (core dumped)

6. Which software versions do you use?

(a) PhotoPrism Architecture & Build Number: AMD64 using github.com/photoprism/photoprism@e808de45e3

(b) Database Type & Version: MariaDB

(c) Operating System Types & Versions: Linux Debian 12.7

(d) Browser Types & Versions: Firefox

(e) Ad Blockers, Browser Plugins, and/or Firewall Software? None

7. On what kind of device is PhotoPrism installed?

This is especially important if you are reporting a performance, import, or indexing issue. You can skip this if you're reporting a problem you found in our public demo, or if it's a completely unrelated issue, such as incorrect page layout.

(a) Device / Processor Type:
VirtualBox host with AMD Ryzen 7 5700X,...

(b) Physical Memory & Swap Space in GB
8Gb/16Gb on VirtualBox Client

(c) Storage Type: HDD, SSD, RAID, USB, Network Storage,...
SSD

(d) Anything else that might be helpful to know?

8. Do you use a Reverse Proxy, Firewall, VPN, or CDN?

No

Originally created by @keif888 on GitHub (Sep 14, 2024). #### 1. What is not working as documented? Development build environment does not work in VirtualBox environment. This is because the tensorflow library fails. ``` localuser@debian-pp-dev:~/repos/photoprism$ make terminal docker compose exec -u 1000 photoprism bash photoprism@240827-noble:/go/src/github.com/photoprism/photoprism$ ./photoprism start 2024-09-14 12:38:57.363445: F tensorflow/core/platform/cpu_feature_guard.cc:37] The TensorFlow library was compiled to use FMA instructions, but these aren't available on your machine. Aborted (core dumped) ``` #### 2. How can we reproduce it? Install debian 12.7 (or other supported linux) on VirtualBox on a host that supports the avx or avx2 instruction set. Install docker Install photoprism development as per the https://docs.photoprism.app/developer-guide/setup/ instructions #### 3. What behavior do you expect? photoprism to start without errors. #### 4. What could be the cause of your problem? /scripts/dist/install-tensorflow.sh is choosing the avx2 version of the tensorflow library, which also uses the fma instruction set. It needs to choose the basic version of the tensorflow library instead if the fma instruction set is not available. #### 5. Can you provide us with example files for testing, error logs, or screenshots? This is from the initial docker compose up output. ``` photoprism-1 | init: tensorflow photoprism-1 | /scripts/install-tensorflow.sh auto photoprism-1 | Detecting driver... photoprism-1 | Installing TensorFlow 1.15.2 for AMD64-AVX2 in "/usr"... photoprism-1 | Downloading amd64 libs from "https://dl.photoprism.app/tensorflow/amd64/libtensorflow-amd64-avx2-1.15.2.tar.gz". Please wait. photoprism-1 | Extracting "/tmp/amd64/libtensorflow-amd64-avx2-1.15.2.tar.gz" to "/usr". ``` This is from the attempt to start photoprism. ``` keith@debian-pp-dev:~/repos/photoprism$ make terminal docker compose exec -u 1000 photoprism bash photoprism@240827-noble:/go/src/github.com/photoprism/photoprism$ ./photoprism start 2024-09-14 12:38:57.363445: F tensorflow/core/platform/cpu_feature_guard.cc:37] The TensorFlow library was compiled to use FMA instructions, but these aren't available on your machine. Aborted (core dumped) ``` #### 6. Which software versions do you use? (a) PhotoPrism Architecture & Build Number: AMD64 using https://github.com/photoprism/photoprism/commit/e808de45e32c91f1622460916051a7df9e79b74c (b) Database Type & Version: MariaDB (c) Operating System Types & Versions: Linux Debian 12.7 (d) Browser Types & Versions: Firefox (e) Ad Blockers, Browser Plugins, and/or Firewall Software? None #### 7. On what kind of device is PhotoPrism installed? This is especially important if you are reporting a performance, import, or indexing issue. You can skip this if you're reporting a problem you found in our public demo, or if it's a completely unrelated issue, such as incorrect page layout. (a) Device / Processor Type: VirtualBox host with AMD Ryzen 7 5700X,... (b) Physical Memory & Swap Space in GB 8Gb/16Gb on VirtualBox Client (c) Storage Type: HDD, SSD, RAID, USB, Network Storage,... SSD (d) Anything else that might be helpful to know? #### 8. Do you use a Reverse Proxy, Firewall, VPN, or CDN? No
deekerman 2026-02-20 01:07:38 -05:00
Author
Owner

@lastzero commented on GitHub (Sep 15, 2024):

Thank you for bringing this to our attention! IMHO this is an upstream problem that should be fixed in VirtualBox as it also affects other applications:

The version of TensorFlow causing the problem was simply compiled for a specific CPU architecture level, which according to the developers of the gcc compiler suite includes AVX2 and FMA. So VirtualBox (or any other virtualization tool) should not invent its own instruction sets and standards that other developers then have to support to avoid getting unexpected bug reports :)

To avoid the error, the easiest way would be to change PHOTOPRISM_INIT: "https tensorflow" in your compose.yaml file to PHOTOPRISM_INIT: "https", which skips installing a custom TensorFlow version in the first place. The reason why this is enabled by default (as opposed to the version we distribute to end users) is that such errors should occur early during development (which was the case).

While we can consider merging the PR you created to work around this issue (while keeping the PHOTOPRISM_INIT: "https tensorflow" setting, which is not suitable in your case), there is a risk that other dependencies compiled for the same (or newer) CPU architecture, such as Darktable or RawTherapee, might run into a similar problem. So it could be that the original error is gone, but there are other issues related to VirtualBox that are harder to find. I would therefore prefer if developers could use a fixed/improved VirtualBox version that supports FMA instructions.

@lastzero commented on GitHub (Sep 15, 2024): Thank you for bringing this to our attention! IMHO this is an upstream problem that should be fixed in VirtualBox as it also affects other applications: - https://www.virtualbox.org/ticket/15471 The version of TensorFlow causing the problem was simply compiled for a specific CPU architecture level, which according to the developers of the gcc compiler suite includes AVX2 and FMA. So VirtualBox (or any other virtualization tool) should not invent its own instruction sets and standards that other developers then have to support to avoid getting unexpected bug reports :) To avoid the error, the easiest way would be to change `PHOTOPRISM_INIT: "https tensorflow"` in your `compose.yaml` file to `PHOTOPRISM_INIT: "https"`, which skips installing a custom TensorFlow version in the first place. The reason why this is enabled by default (as opposed to the version we distribute to end users) is that such errors should occur early during development (which was the case). While we can consider merging the PR you created to work around this issue (while keeping the `PHOTOPRISM_INIT: "https tensorflow"` setting, which is not suitable in your case), there is a risk that other dependencies compiled for the same (or newer) CPU architecture, such as Darktable or RawTherapee, might run into a similar problem. So it could be that the original error is gone, but there are other issues related to VirtualBox that are harder to find. I would therefore prefer if developers could use a fixed/improved VirtualBox version that supports FMA instructions.
Author
Owner

@keif888 commented on GitHub (Sep 16, 2024):

I discovered that the VIA Eden X4 CPU will have the same issue as it doesn't support FMA, but supports AVX2. Every other CPU that has AVX2 also has FMA, at least the ones that GCC 15 supports.
So there is one use case other than VirtualBox for this.

I wished that Oracle would implement the FMA instruction in VirtualBox. The instruction has been available since 2012, and the ticket for it to be implemented in VirtualBox has been open for 8 years.

I agree that other libraries could also have similar issues with instruction set incompatibility, although this is less likely with libraries that are being utilised via apt as they are much more likely to run into VirtualBox users before it breaks in PhotoPrism.

If you choose not to implement the pull, could you add an FAQ about using VirtualBox? Or what to do if tensorflow fails?

I have modified the pull to improve the way the issue is detected, such that it ensures that both FMA and AVX2 are available before choosing avx2, and then checks for avx (as it doesn't have fma in the required capabilities). This is using the all capability in jq, which ensures that all the values returned in the array are true, and returns false if that is not the case.

  CPU_DETECTED=$(lshw -c processor -json 2>/dev/null)

  if [[ $(echo "${CPU_DETECTED}" | jq -r '[.[].capabilities.avx2,.[].capabilities.fma] | all') == "true" ]]; then
    TF_DRIVER="avx2"
    echo "Driver avx2 detected"
  elif [[ $(echo "${CPU_DETECTED}" | jq -r '.[].capabilities.avx') == "true" ]]; then
    TF_DRIVER="avx"
    echo "Driver avx detected"
  else
    TF_DRIVER=""
    echo "No drivers detected"
  fi

New Test Output (which is better than my previous solution).

photoprism@240827-noble:/go/src/github.com/photoprism/photoprism$ cd scripts/dist/
photoprism@240827-noble:/go/src/github.com/photoprism/photoprism/scripts/dist$ sudo ./install-tensorflow.sh auto
Detecting driver...
Driver avx detected
Installing TensorFlow 1.15.2 for AMD64-AVX in "/usr"...
Extracting "/tmp/amd64/libtensorflow-amd64-avx-1.15.2.tar.gz" to "/usr".
Running "ldconfig".
Done.
photoprism@240827-noble:/go/src/github.com/photoprism/photoprism/scripts/dist$ cd ../..
photoprism@240827-noble:/go/src/github.com/photoprism/photoprism$ ./photoprism start
DEBU[2024-09-16T10:43:32Z] config: running on 'AMD Ryzen 7 5700X 8-Core Processor', 8.3 GB memory detected 
DEBU[2024-09-16T10:43:32Z] settings: loaded from /go/src/github.com/photoprism/photoprism/storage/config/settings.yml 
DEBU[2024-09-16T10:43:32Z] vips: max cache size is 256 MB, using up to 2 workers 
INFO[2024-09-16T10:43:33Z] Become a member today, support our mission and enjoy our member benefits! 💎 
INFO[2024-09-16T10:43:33Z] Visit https://www.photoprism.app/membership to learn more. 
DEBU[2024-09-16T10:43:33Z] config: successfully initialized [1.158065883s] 
@keif888 commented on GitHub (Sep 16, 2024): I discovered that the VIA Eden X4 CPU will have the same issue as it doesn't support FMA, but supports AVX2. Every other CPU that has AVX2 also has FMA, at least the ones that GCC 15 supports. So there is one use case other than VirtualBox for this. I wished that Oracle would implement the FMA instruction in VirtualBox. The instruction has been available since 2012, and the ticket for it to be implemented in VirtualBox has been open for 8 years. I agree that other libraries could also have similar issues with instruction set incompatibility, although this is less likely with libraries that are being utilised via apt as they are much more likely to run into VirtualBox users before it breaks in PhotoPrism. If you choose not to implement the pull, could you add an FAQ about using VirtualBox? Or what to do if tensorflow fails? I have modified the pull to improve the way the issue is detected, such that it ensures that both FMA and AVX2 are available before choosing avx2, and then checks for avx (as it doesn't have fma in the required capabilities). This is using the all capability in jq, which ensures that all the values returned in the array are true, and returns false if that is not the case. ``` CPU_DETECTED=$(lshw -c processor -json 2>/dev/null) if [[ $(echo "${CPU_DETECTED}" | jq -r '[.[].capabilities.avx2,.[].capabilities.fma] | all') == "true" ]]; then TF_DRIVER="avx2" echo "Driver avx2 detected" elif [[ $(echo "${CPU_DETECTED}" | jq -r '.[].capabilities.avx') == "true" ]]; then TF_DRIVER="avx" echo "Driver avx detected" else TF_DRIVER="" echo "No drivers detected" fi ``` New Test Output (which is better than my previous solution). ``` photoprism@240827-noble:/go/src/github.com/photoprism/photoprism$ cd scripts/dist/ photoprism@240827-noble:/go/src/github.com/photoprism/photoprism/scripts/dist$ sudo ./install-tensorflow.sh auto Detecting driver... Driver avx detected Installing TensorFlow 1.15.2 for AMD64-AVX in "/usr"... Extracting "/tmp/amd64/libtensorflow-amd64-avx-1.15.2.tar.gz" to "/usr". Running "ldconfig". Done. photoprism@240827-noble:/go/src/github.com/photoprism/photoprism/scripts/dist$ cd ../.. photoprism@240827-noble:/go/src/github.com/photoprism/photoprism$ ./photoprism start DEBU[2024-09-16T10:43:32Z] config: running on 'AMD Ryzen 7 5700X 8-Core Processor', 8.3 GB memory detected DEBU[2024-09-16T10:43:32Z] settings: loaded from /go/src/github.com/photoprism/photoprism/storage/config/settings.yml DEBU[2024-09-16T10:43:32Z] vips: max cache size is 256 MB, using up to 2 workers INFO[2024-09-16T10:43:33Z] Become a member today, support our mission and enjoy our member benefits! 💎 INFO[2024-09-16T10:43:33Z] Visit https://www.photoprism.app/membership to learn more. DEBU[2024-09-16T10:43:33Z] config: successfully initialized [1.158065883s] ```
Author
Owner

@srett commented on GitHub (Sep 18, 2024):

The version of TensorFlow causing the problem was simply compiled for a specific CPU architecture level

There is no such thing really. The gcc optimization options are just shortcuts to enable a set of extended instructions found in the given CPU. Pick the CPU of your target system, get all the optimizations, instead of researching all the supported extensions manually and adding a dozen command line switches.
In that case it is rather a bug that the tensorflow install script only checks for AVX2 and assumes that means all the other extensions the targeted CPU architecture has will be there too. @keif888 brought up that VIA CPU as a good example.
Another similar case would be AVX512, which Intel removed in more recent desktop CPUs, so assuming it will be present starting from a certain Intel CPU generation would be wrong too.
Actually in 2021 I prototyped an automatic install script that picks the best library and would check for all the required CPU features the according -march gcc option enables. :)

@srett commented on GitHub (Sep 18, 2024): > The version of TensorFlow causing the problem was simply compiled for a specific CPU architecture level There is no such thing really. The gcc optimization options are just shortcuts to enable a set of extended instructions found in the given CPU. Pick the CPU of your target system, get all the optimizations, instead of researching all the supported extensions manually and adding a dozen command line switches. In that case it is rather a bug that the tensorflow install script only checks for AVX2 and assumes that means all the other extensions the targeted CPU architecture has will be there too. @keif888 brought up that VIA CPU as a good example. Another similar case would be AVX512, which Intel removed in more recent desktop CPUs, so assuming it will be present starting from a certain Intel CPU generation would be wrong too. Actually in 2021 [I prototyped an automatic install script](https://github.com/photoprism/photoprism/issues/536#issuecomment-932363389) that picks the best library and would check for *all* the required CPU features the according `-march` gcc option enables. :)
Author
Owner

@lastzero commented on GitHub (Sep 18, 2024):

The preset used for compilation was "haswell":

It includes the MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, RDRND, F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE and HLE instructions. As we have limited resources available for development, we haven't added a dedicated check for all of them. IMHO if the optimized version doesn't work for you because you have a very special CPU or a VM with lacking instruction support, it's easy (and the default, except for the dev environment) to use the standard version. So there is no bug in the version of PhotoPrism provided to end users.

@lastzero commented on GitHub (Sep 18, 2024): The preset used for compilation was "haswell": - https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html It includes the MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, RDRND, F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE and HLE instructions. As we have limited resources available for development, we haven't added a dedicated check for all of them. IMHO if the optimized version doesn't work for you because you have a very special CPU or a VM with lacking instruction support, it's easy (and the default, except for the dev environment) to use the standard version. So there is no bug in the version of PhotoPrism provided to end users.
Author
Owner

@keif888 commented on GitHub (Feb 24, 2025):

Virtual Box has been updated to support FMA instructions as at 7.1.4.
As such this issue is no longer a problem for me.
And as noted by lastzero, the PhotoPrism for end users doesn't have this issue.

@keif888 commented on GitHub (Feb 24, 2025): Virtual Box has been updated to support FMA instructions as at 7.1.4. As such this issue is no longer a problem for me. And as noted by lastzero, the PhotoPrism for end users doesn't have this issue.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/photoprism#2186
No description provided.