[BUG]: File Parsing Fails for URLs Without Explicit File Extensions #2897

Open
opened 2026-02-28 06:23:03 -05:00 by deekerman · 1 comment
Owner

Originally created by @angelplusultra on GitHub (Oct 8, 2025).

Originally assigned to: @timothycarambat on GitHub.

How are you running AnythingLLM?

All versions

What happened?

When attempting to pull and parse a file using either the RAG Modal or @agent mode, the process fails if the URL does not explicitly end with a file extension (e.g., .pdf, .csv). This occurs even when the server responds with a correct Content-Type header that identifies the file type.

Example:

The following URL fails to be processed, despite responding with an application/pdf content type:

https://arxiv.org/pdf/2307.10265

Observed Behavior:

The application logs display the following error:


[2] Error processing single file File extension .10265 not supported for parsing and cannot be assumed as text file type.

This error originates from the file extension guard located at:

github.com/Mintplex-Labs/anything-llm@89a01492b5/collector/processSingleFile/index.js (L58-L72)

Expected Behavior:

The system should be able to successfully pull and parse files from URLs that do not explicitly contain a file extension, provided the Content-Type header in the server's response clearly indicates the file's MIME type.


Are there known steps to reproduce?

No response

Originally created by @angelplusultra on GitHub (Oct 8, 2025). Originally assigned to: @timothycarambat on GitHub. ### How are you running AnythingLLM? All versions ### What happened? When attempting to pull and parse a file using either the RAG Modal or `@agent` mode, the process fails if the URL does not explicitly end with a file extension (e.g., `.pdf`, `.csv`). This occurs even when the server responds with a correct `Content-Type` header that identifies the file type. **Example:** The following URL fails to be processed, despite responding with an `application/pdf` content type: https://arxiv.org/pdf/2307.10265 **Observed Behavior:** The application logs display the following error: ``` [2] Error processing single file File extension .10265 not supported for parsing and cannot be assumed as text file type. ``` This error originates from the file extension guard located at: https://github.com/Mintplex-Labs/anything-llm/blob/89a01492b51a23150b59732166b90ebdd1843c50/collector/processSingleFile/index.js#L58-L72 **Expected Behavior:** The system should be able to successfully pull and parse files from URLs that do not explicitly contain a file extension, provided the `Content-Type` header in the server's response clearly indicates the file's MIME type. --- ### Are there known steps to reproduce? _No response_
Author
Owner

@Guru6163 commented on GitHub (Oct 8, 2025):

I would like to work on this @timothycarambat

@Guru6163 commented on GitHub (Oct 8, 2025): I would like to work on this @timothycarambat
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/anything-llm#2897
No description provided.