mirror of
https://github.com/Mintplex-Labs/anything-llm.git
synced 2026-03-02 22:57:05 -05:00
[FEAT]: Add Support to Anthropic & OpenAI Batch APIs #1622
Labels
No labels
Desktop
Docker
Integration Request
Integration Request
OS: Linux
OS: Mobile
OS: Windows
UI/UX
blocked
bug
bug
core-team-only
documentation
duplicate
embed-widget
enhancement
feature request
github_actions
good first issue
investigating
needs info / can't replicate
possible bug
question
stage: specifications
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/anything-llm#1622
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @MichaelYochpaz on GitHub (Oct 15, 2024).
What would you like to see?
Hey there!
First off, thank you for working on this great project :)
Is it possible to add support for Batch APIs, provided by Anthropic and OpenAI?
This feature for their APIs basically allows a "50% discount" for the API calls, in exchange for allowing the responses to take up to 24 hours (so that they can run them when the servers aren't overloaded).
This is useful for saving money on calls that aren't necessarily needed immediately (for example, a request to summarize a book, suggest a design for a software, etc.). Especially when using the more expensive models, like Claude Opus and OpenAI o1, with big context size.
The way it works seems to be that once the request is sent, you can query it to check whether the request is finished or not (so you'll need to query it in interval. For example, every minute. Maybe make it configurable in the settings), and then once the response says it's ready, you can query for the output.
This probably won't be simple, as it requires implementing a new mechanism of waiting for a response (polling) and adding a way to communicate that in the UI (maybe a spinning wheel showing the response hasn't been generated yet), but I do think it will be a great addition that will be very useful.
Plus, since OpenAI introduced it, and now Anthropic followed, we might see more of these available for other APIs (this also means that if implemented, having generic code that will support other similar APIs in the future might be a good idea).
@timothycarambat commented on GitHub (Oct 30, 2024):
Following up on this, since the natural sense of using a UI for chatting would be to send and receive a result in a reasonable timeframe - how could one use batching to help with our use case? Sending a request and getting a response of "we will return a response later" to many may be frustrating or useless.
Can you expand with a specific use case in mind? I am not seeing a clear value for users to get responses minutes or possibly hours after request
@MichaelYochpaz commented on GitHub (Oct 31, 2024):
Hey @timothycarambat, thank you for responding :)
This feature is not really intended for general chatting use with simple short questions (which are quite cheap anyways), but for more complex ones that include a massive context and might be used with more expensive models (like o1-preview for example), where a 50% discount is quite meaningful and could save a few bucks for a single request.
The specific use-case I'd like to use
This type of prompt includes a huge context that will cost a meaningful amount of money (especially for models like o1-preview and o1-mini which are expensive), and I wouldn't mind getting the results a few hours later for saving a decent amount of money.
Another possible case that comes to mind is asking it to write a summary about a topic while adding several relevant books / research papers as context.
Now I don't think this should be for the entire chat, there should be an option for each message (for example, if I have a followup question after getting the result, and I don't want to wait again, I can untick a "batch request" checkbox, and then the next message will be a regular API request without the header settings it as a "batch request").