mirror of
https://github.com/photoprism/photoprism.git
synced 2026-03-02 22:57:18 -05:00
VirtualBox: TensorFlow with AVX2 support does not work due to missing FMA instructions #2186
Labels
No labels
ai
android
api
auth
awesome
bug
bug
ci
cli
config
database
declined
deprecated
docker
docs 📚
documents
duplicate
easy
enhancement
enhancement
enhancement
epic
faces
feedback wanted
frontend
hacktoberfest
help wanted
idea
in-progress
incomplete
index
invalid
ios
labels
live
live
low-priority
macos
member-feature
metadata
mobile
nas
needs-analysis
no-coding-required
no-coding-required
observability
performance
places
please-test
plus-feature
priority
pro-feature
question
raspberry-pi
raw
released
released
released
research
resolved
security
sharing
tested
tests
third-party-issue
thumbnails
upgrade
upstream-issue
ux
vector
video
waiting
won't fix
won't fix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/photoprism#2186
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @keif888 on GitHub (Sep 14, 2024).
1. What is not working as documented?
Development build environment does not work in VirtualBox environment.
This is because the tensorflow library fails.
2. How can we reproduce it?
Install debian 12.7 (or other supported linux) on VirtualBox on a host that supports the avx or avx2 instruction set.
Install docker
Install photoprism development as per the https://docs.photoprism.app/developer-guide/setup/ instructions
3. What behavior do you expect?
photoprism to start without errors.
4. What could be the cause of your problem?
/scripts/dist/install-tensorflow.sh is choosing the avx2 version of the tensorflow library, which also uses the fma instruction set.
It needs to choose the basic version of the tensorflow library instead if the fma instruction set is not available.
5. Can you provide us with example files for testing, error logs, or screenshots?
This is from the initial docker compose up output.
This is from the attempt to start photoprism.
6. Which software versions do you use?
(a) PhotoPrism Architecture & Build Number: AMD64 using
github.com/photoprism/photoprism@e808de45e3(b) Database Type & Version: MariaDB
(c) Operating System Types & Versions: Linux Debian 12.7
(d) Browser Types & Versions: Firefox
(e) Ad Blockers, Browser Plugins, and/or Firewall Software? None
7. On what kind of device is PhotoPrism installed?
This is especially important if you are reporting a performance, import, or indexing issue. You can skip this if you're reporting a problem you found in our public demo, or if it's a completely unrelated issue, such as incorrect page layout.
(a) Device / Processor Type:
VirtualBox host with AMD Ryzen 7 5700X,...
(b) Physical Memory & Swap Space in GB
8Gb/16Gb on VirtualBox Client
(c) Storage Type: HDD, SSD, RAID, USB, Network Storage,...
SSD
(d) Anything else that might be helpful to know?
8. Do you use a Reverse Proxy, Firewall, VPN, or CDN?
No
@lastzero commented on GitHub (Sep 15, 2024):
Thank you for bringing this to our attention! IMHO this is an upstream problem that should be fixed in VirtualBox as it also affects other applications:
The version of TensorFlow causing the problem was simply compiled for a specific CPU architecture level, which according to the developers of the gcc compiler suite includes AVX2 and FMA. So VirtualBox (or any other virtualization tool) should not invent its own instruction sets and standards that other developers then have to support to avoid getting unexpected bug reports :)
To avoid the error, the easiest way would be to change
PHOTOPRISM_INIT: "https tensorflow"in yourcompose.yamlfile toPHOTOPRISM_INIT: "https", which skips installing a custom TensorFlow version in the first place. The reason why this is enabled by default (as opposed to the version we distribute to end users) is that such errors should occur early during development (which was the case).While we can consider merging the PR you created to work around this issue (while keeping the
PHOTOPRISM_INIT: "https tensorflow"setting, which is not suitable in your case), there is a risk that other dependencies compiled for the same (or newer) CPU architecture, such as Darktable or RawTherapee, might run into a similar problem. So it could be that the original error is gone, but there are other issues related to VirtualBox that are harder to find. I would therefore prefer if developers could use a fixed/improved VirtualBox version that supports FMA instructions.@keif888 commented on GitHub (Sep 16, 2024):
I discovered that the VIA Eden X4 CPU will have the same issue as it doesn't support FMA, but supports AVX2. Every other CPU that has AVX2 also has FMA, at least the ones that GCC 15 supports.
So there is one use case other than VirtualBox for this.
I wished that Oracle would implement the FMA instruction in VirtualBox. The instruction has been available since 2012, and the ticket for it to be implemented in VirtualBox has been open for 8 years.
I agree that other libraries could also have similar issues with instruction set incompatibility, although this is less likely with libraries that are being utilised via apt as they are much more likely to run into VirtualBox users before it breaks in PhotoPrism.
If you choose not to implement the pull, could you add an FAQ about using VirtualBox? Or what to do if tensorflow fails?
I have modified the pull to improve the way the issue is detected, such that it ensures that both FMA and AVX2 are available before choosing avx2, and then checks for avx (as it doesn't have fma in the required capabilities). This is using the all capability in jq, which ensures that all the values returned in the array are true, and returns false if that is not the case.
New Test Output (which is better than my previous solution).
@srett commented on GitHub (Sep 18, 2024):
There is no such thing really. The gcc optimization options are just shortcuts to enable a set of extended instructions found in the given CPU. Pick the CPU of your target system, get all the optimizations, instead of researching all the supported extensions manually and adding a dozen command line switches.
In that case it is rather a bug that the tensorflow install script only checks for AVX2 and assumes that means all the other extensions the targeted CPU architecture has will be there too. @keif888 brought up that VIA CPU as a good example.
Another similar case would be AVX512, which Intel removed in more recent desktop CPUs, so assuming it will be present starting from a certain Intel CPU generation would be wrong too.
Actually in 2021 I prototyped an automatic install script that picks the best library and would check for all the required CPU features the according
-marchgcc option enables. :)@lastzero commented on GitHub (Sep 18, 2024):
The preset used for compilation was "haswell":
It includes the MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, RDRND, F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE and HLE instructions. As we have limited resources available for development, we haven't added a dedicated check for all of them. IMHO if the optimized version doesn't work for you because you have a very special CPU or a VM with lacking instruction support, it's easy (and the default, except for the dev environment) to use the standard version. So there is no bug in the version of PhotoPrism provided to end users.
@keif888 commented on GitHub (Feb 24, 2025):
Virtual Box has been updated to support FMA instructions as at 7.1.4.
As such this issue is no longer a problem for me.
And as noted by lastzero, the PhotoPrism for end users doesn't have this issue.