Vision: Use "structured output" feature instead of JSON in prompt #2484

New issue

Open

opened 2026-02-20 01:12:19 -05:00 by deekerman · 0 comments

deekerman commented

2026-02-20 01:12:19 -05:00

Owner

Originally created by @Randomblock1 on GitHub (Feb 18, 2026).

Confirmation

I checked this request against the roadmap and existing issues

What Problem Does This Solve and Why Is It Valuable?

The current Ollama and OpenAI vision implementation injects the JSON structure as part of the prompt. The "Structured Output" feature in LLM APIs like Ollama and OpenAI actually force the model to generate correct JSON by limiting the model's token choice.

Additionally, because it's a feature built into the token sampler, there is no performance penalty (compared to a small one for the extra prompt tokens), and it will improve accuracy of smaller models.

See https://docs.ollama.com/capabilities/structured-outputs and https://developers.openai.com/api/docs/guides/structured-outputs.

What Solution Would You Like?

Use the Structured Output feature with a schema instead of injecting JSON generation instructions in the prompt.

What Alternatives Have You Considered?

Current method of injecting JSON into the prompt incurs a small performance penalty, and the model can still refuse to generate JSON or generate incorrect JSON. It does work most of the time, but not all the time, especially with smaller models.

Additional Context

Ollama docs recommend setting temperature to 0 when doing vision. This should probably be done as well.

Originally created by @Randomblock1 on GitHub (Feb 18, 2026). ### Confirmation - [x] I checked this request against the roadmap and existing issues ### What Problem Does This Solve and Why Is It Valuable? The current Ollama and OpenAI vision implementation injects the JSON structure as part of the prompt. The "Structured Output" feature in LLM APIs like Ollama and OpenAI actually force the model to generate correct JSON by limiting the model's token choice. Additionally, because it's a feature built into the token sampler, there is no performance penalty (compared to a small one for the extra prompt tokens), and it will improve accuracy of smaller models. See https://docs.ollama.com/capabilities/structured-outputs and https://developers.openai.com/api/docs/guides/structured-outputs. ### What Solution Would You Like? Use the Structured Output feature with a schema instead of injecting JSON generation instructions in the prompt. ### What Alternatives Have You Considered? Current method of injecting JSON into the prompt incurs a small performance penalty, and the model can still refuse to generate JSON or generate incorrect JSON. It does work most of the time, but not all the time, especially with smaller models. ### Additional Context Ollama docs recommend setting temperature to 0 when doing vision. This should probably be done as well.