[BUG]: Recognize LiteLLM model's max context and remove `max_tokens` #2226

New issue

Open

opened 2026-02-28 06:00:20 -05:00 by deekerman · 0 comments

deekerman commented

2026-02-28 06:00:20 -05:00

Owner

Originally created by @ringge on GitHub (Mar 8, 2025).

Originally assigned to: @angelplusultra on GitHub.

How are you running AnythingLLM?

AnythingLLM desktop app

What happened?

I'm using document pinning for my workspace.
If using the model directly, for example I used Gemini 2.0, or any other model, the document pinning works as expected.
However, if using the same model (like Gemini 2.0) through LiteLLM, anythingLLM seems not being able to detect the max context through LiteLLM so it automatically truncate the document

Are there known steps to reproduce?

Upload a document, enable document pinning
Set model used as Gemini 2.0, it works
Switch model to LiteLLM, still using Gemini 2.0, the prompt is truncated

Originally created by @ringge on GitHub (Mar 8, 2025). Originally assigned to: @angelplusultra on GitHub. ### How are you running AnythingLLM? AnythingLLM desktop app ### What happened? I'm using document pinning for my workspace. If using the model directly, for example I used Gemini 2.0, or any other model, the document pinning works as expected. However, if using the same model (like Gemini 2.0) through LiteLLM, anythingLLM seems not being able to detect the max context through LiteLLM so it automatically truncate the document ### Are there known steps to reproduce? 1. Upload a document, enable document pinning 2. Set model used as Gemini 2.0, it works 3. Switch model to LiteLLM, still using Gemini 2.0, the prompt is truncated