mirror of
https://github.com/photoprism/photoprism.git
synced 2026-03-02 22:57:18 -05:00
Vision: Use "structured output" feature instead of JSON in prompt #2484
Labels
No labels
ai
android
api
auth
awesome
bug
bug
ci
cli
config
database
declined
deprecated
docker
docs 📚
documents
duplicate
easy
enhancement
enhancement
enhancement
epic
faces
feedback wanted
frontend
hacktoberfest
help wanted
idea
in-progress
incomplete
index
invalid
ios
labels
live
live
low-priority
macos
member-feature
metadata
mobile
nas
needs-analysis
no-coding-required
no-coding-required
observability
performance
places
please-test
plus-feature
priority
pro-feature
question
raspberry-pi
raw
released
released
released
research
resolved
security
sharing
tested
tests
third-party-issue
thumbnails
upgrade
upstream-issue
ux
vector
video
waiting
won't fix
won't fix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/photoprism#2484
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Randomblock1 on GitHub (Feb 18, 2026).
Confirmation
What Problem Does This Solve and Why Is It Valuable?
The current Ollama and OpenAI vision implementation injects the JSON structure as part of the prompt. The "Structured Output" feature in LLM APIs like Ollama and OpenAI actually force the model to generate correct JSON by limiting the model's token choice.
Additionally, because it's a feature built into the token sampler, there is no performance penalty (compared to a small one for the extra prompt tokens), and it will improve accuracy of smaller models.
See https://docs.ollama.com/capabilities/structured-outputs and https://developers.openai.com/api/docs/guides/structured-outputs.
What Solution Would You Like?
Use the Structured Output feature with a schema instead of injecting JSON generation instructions in the prompt.
What Alternatives Have You Considered?
Current method of injecting JSON into the prompt incurs a small performance penalty, and the model can still refuse to generate JSON or generate incorrect JSON. It does work most of the time, but not all the time, especially with smaller models.
Additional Context
Ollama docs recommend setting temperature to 0 when doing vision. This should probably be done as well.