starred/anything-llm

Fork 0

mirror of https://github.com/Mintplex-Labs/anything-llm.git synced 2026-03-02 22:57:05 -05:00

[FEAT]: Support custom or downloadable ASR models for Meeting Assistant (e.g., Whisper, VibeVoice-ASR) #3113

New issue

Open

opened 2026-02-28 06:30:35 -05:00 by deekerman · 3 comments

deekerman commented

2026-02-28 06:30:35 -05:00

Owner

Originally created by @skyking363 on GitHub (Jan 23, 2026).

What would you like to see?

Description

I would like to suggest adding the ability for users to download or load custom ASR models for the Meeting Assistant.

Currently, the default parakeet-tdt-0.6b-v3 is extremely fast but only supports 25 European languages. To support more languages like Chinese (Traditional/Simplified), users need the flexibility to choose more capable models.

Use Case

Many users in Asia require high-accuracy transcription for CJK languages. By allowing us to point to a specific model, we can balance speed and language support according to our own hardware and needs.

Proposed Solution

Instead of a hardcoded model, please consider adding a "Model Settings" section in the Meeting Assistant:

Downloadable Presets: Allow users to choose and download different versions of Whisper (Tiny/Base/Small/Medium/Large).
Custom Model ID/Path: Allow users to input a Hugging Face model ID or a local path to load specific ASR models, such as:
- Whisper variants (for broader language support)
- VibeVoice-ASR (https://huggingface.co/microsoft/VibeVoice-ASR) - which shows great potential for diverse tasks.

This would make the Meeting Assistant truly "Anything" and accessible to a global audience.

Thank you for your hard work on this amazing tool!

Originally created by @skyking363 on GitHub (Jan 23, 2026). ### What would you like to see? ### Description I would like to suggest adding the ability for users to **download or load custom ASR models** for the Meeting Assistant. Currently, the default `parakeet-tdt-0.6b-v3` is extremely fast but only supports 25 European languages. To support more languages like Chinese (Traditional/Simplified), users need the flexibility to choose more capable models. ### Use Case Many users in Asia require high-accuracy transcription for CJK languages. By allowing us to point to a specific model, we can balance speed and language support according to our own hardware and needs. ### Proposed Solution Instead of a hardcoded model, please consider adding a **"Model Settings"** section in the Meeting Assistant: 1. **Downloadable Presets:** Allow users to choose and download different versions of Whisper (Tiny/Base/Small/Medium/Large). 2. **Custom Model ID/Path:** Allow users to input a Hugging Face model ID or a local path to load specific ASR models, such as: - **Whisper variants** (for broader language support) - **VibeVoice-ASR** (https://huggingface.co/microsoft/VibeVoice-ASR) - which shows great potential for diverse tasks. This would make the Meeting Assistant truly "Anything" and accessible to a global audience. Thank you for your hard work on this amazing tool!

deekerman added the

labels

2026-02-28 06:30:35 -05:00

deekerman commented

2026-02-28 06:30:41 -05:00

Author

Owner

@tylerrobb commented on GitHub (Jan 23, 2026):

You read my mind! I was browsing the Hugging Face Open ASR Leaderboard and wanted to test out nvidia/canary-qwen-2.5b or nvidia/canary-1b-v2 instead.

https://huggingface.co/spaces/hf-audio/open_asr_leaderboard

@tylerrobb commented on GitHub (Jan 23, 2026): You read my mind! I was browsing the Hugging Face Open ASR Leaderboard and wanted to test out nvidia/canary-qwen-2.5b or nvidia/canary-1b-v2 instead. https://huggingface.co/spaces/hf-audio/open_asr_leaderboard

deekerman commented

2026-02-28 06:30:41 -05:00

Author

Owner

@laweschan commented on GitHub (Jan 23, 2026):

new update:
although it is not OpenAI ASR compatible, however, it seems the ranking is quite well in testing, and multilanguage support for the following model, pls consider it as well

https://github.com/QwenLM/Qwen3-ASR

~~yes and I would recommend to consider the sensevoice selection, as it has quite well comment on support for chinese + cantonese + english in STT~~

~~https://github.com/FunAudioLLM/SenseVoice~~

@laweschan commented on GitHub (Jan 23, 2026): new update: although it is not OpenAI ASR compatible, however, it seems the ranking is quite well in testing, and multilanguage support for the following model, pls consider it as well https://github.com/QwenLM/Qwen3-ASR ~~yes and I would recommend to consider the sensevoice selection, as it has quite well comment on support for chinese + cantonese + english in STT~~ ~~https://github.com/FunAudioLLM/SenseVoice~~

deekerman commented

2026-02-28 06:30:41 -05:00

Author

Owner

@LeoThomassey commented on GitHub (Feb 19, 2026):

Parakeet (might be) is good in english but miss accuracy in french for example. Whisper large is way better in multilingual but I can't use any other model as I use Vulkan and no NVIDIA GPU.

All anythingLLM is customisable but not this settings making the whole meeting assistant useless for non-english content.

@LeoThomassey commented on GitHub (Feb 19, 2026): Parakeet (might be) is good in english but miss accuracy in french for example. Whisper large is way better in multilingual but I can't use any other model as I use Vulkan and no NVIDIA GPU. All anythingLLM is customisable but not this settings making the whole meeting assistant useless for non-english content.

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

starred/anything-llm#3113

No description provided.

Rows
Columns