Vision: Support multiple Ollama services and parallel jobs #2453

New issue

Open

opened 2026-02-20 01:11:46 -05:00 by deekerman · 1 comment

deekerman commented

2026-02-20 01:11:46 -05:00

Owner

Originally created by @alexislefebvre on GitHub (Dec 9, 2025).

Confirmation

I checked this request against the roadmap and existing issues

What Problem Does This Solve and Why Is It Valuable?

Today we can setup Vision like this:

Models:
- Type: caption
  Model: gemma3:latest
  Engine: ollama
  Run: auto
  Prompt: >
    Create a caption with exactly one sentence in the active voice that
    describes the main visual content. Begin with the main subject and
    clear action. Avoid text formatting, meta-language, and filler words.
  Service:
    Uri: http://ollama:11434/api/generate

Source: https://docs.photoprism.app/user-guide/ai/ollama-models/#gemma-3-caption

What Solution Would You Like?

I would like to be able to use several instances of Ollama, with something like this:

Models:
- Type: caption
  Model: gemma3:latest
  Engine: ollama
  Run: auto
  Prompt: >
    Create a caption with exactly one sentence in the active voice that
    describes the main visual content. Begin with the main subject and
    clear action. Avoid text formatting, meta-language, and filler words.
  Service:
    Uri: http://ollama:11434/api/generate
- Type: caption
  Model: gemma3:latest
  Engine: ollama
  Run: auto
  Prompt: >
    Create a caption with exactly one sentence in the active voice that
    describes the main visual content. Begin with the main subject and
    clear action. Avoid text formatting, meta-language, and filler words.
  Service:
    Uri: http://another.computer.local:11434/api/generate

Then Photoprism could parallelize the Vision jobs on these 2 servers.

And if one server is down, it would call the other one instead.

It could also be used to configure a local instance and a SaaS instance.

It may also support a Priority: value:

if all services have the same priority, they are all called
if services have different priorities, the one with the highest priority is called first, and the second is called only if the first is not available

What Alternatives Have You Considered?

Additional Context

Well, GPUs are pretty expensive these days, so that won’t be a common situation.

And adding parallelization may be tricky.

Yet I think that it would be interesting, even to use it at fallbacks between several servers.

Originally created by @alexislefebvre on GitHub (Dec 9, 2025). ### Confirmation - [x] I checked this request against the roadmap and existing issues ### What Problem Does This Solve and Why Is It Valuable? Today we can setup Vision like this: ```yaml Models: - Type: caption Model: gemma3:latest Engine: ollama Run: auto Prompt: > Create a caption with exactly one sentence in the active voice that describes the main visual content. Begin with the main subject and clear action. Avoid text formatting, meta-language, and filler words. Service: Uri: http://ollama:11434/api/generate ``` Source: https://docs.photoprism.app/user-guide/ai/ollama-models/#gemma-3-caption ### What Solution Would You Like? I would like to be able to use several instances of Ollama, with something like this: ```yaml Models: - Type: caption Model: gemma3:latest Engine: ollama Run: auto Prompt: > Create a caption with exactly one sentence in the active voice that describes the main visual content. Begin with the main subject and clear action. Avoid text formatting, meta-language, and filler words. Service: Uri: http://ollama:11434/api/generate - Type: caption Model: gemma3:latest Engine: ollama Run: auto Prompt: > Create a caption with exactly one sentence in the active voice that describes the main visual content. Begin with the main subject and clear action. Avoid text formatting, meta-language, and filler words. Service: Uri: http://another.computer.local:11434/api/generate ``` Then Photoprism could parallelize the Vision jobs on these 2 servers. And if one server is down, it would call the other one instead. It could also be used to configure a local instance and a SaaS instance. It may also support a `Priority:` value: - if all services have the same priority, they are all called - if services have different priorities, the one with the highest priority is called first, and the second is called only if the first is not available ### What Alternatives Have You Considered? . ### Additional Context Well, GPUs are pretty expensive these days, so that won’t be a common situation. And adding parallelization may be tricky. Yet I think that it would be interesting, even to use it at fallbacks between several servers.