[BUG]: addDocumentToNamespace LanceDBError: Append with different schema #609

Closed
opened 2026-02-28 04:52:08 -05:00 by deekerman · 2 comments
Owner

Originally created by @majestichou on GitHub (Mar 28, 2024).

How are you running AnythingLLM?

Docker (local)

What happened?

I use the sfr-embedding-mistral:q4_k_m embedding model and LanceDB. I uploaded a document to the workspace and clicked the "Save and Embed" button. The error occurred.
[OllamaEmbedder] Embedding 2 chunks of text with sfr-embedding-mistral:q4_k_m.
Inserting vectorized chunks into LanceDB collection.

addDocumentToNamespace LanceDBError: Append with different schema: original=Field(id=0, name=vector, type=fixed_size_list:float:384)
Field(id=1, name=id, type=string)
Field(id=2, name=url, type=string)
Field(id=3, name=title, type=string)
Field(id=4, name=docAuthor, type=string)
Field(id=5, name=description, type=string)
Field(id=6, name=docSource, type=string)
Field(id=7, name=chunkSource, type=string)
Field(id=8, name=published, type=string)
Field(id=9, name=wordCount, type=double)
Field(id=10, name=token_count_estimate, type=double)
Field(id=11, name=text, type=string)
 new=Field(id=0, name=vector, type=fixed_size_list:float:4096)
Field(id=1, name=id, type=string)
Field(id=2, name=url, type=string)
Field(id=3, name=title, type=string)
Field(id=4, name=docAuthor, type=string)
Field(id=5, name=description, type=string)
Field(id=6, name=docSource, type=string)
Field(id=7, name=chunkSource, type=string)
Field(id=8, name=published, type=string)
Field(id=9, name=wordCount, type=double)
Field(id=10, name=token_count_estimate, type=double)
Field(id=11, name=text, type=string)

As far as I know, the output of the sfr-embedding-mistral:q4_k_m model has dimension 4096, which cannot be changed. Can the data storage dimension of the LanceDB database be changed?
Maybe changing the data storage dimension from 384 to 4096 will fix this?

Are there known steps to reproduce?

No response

Originally created by @majestichou on GitHub (Mar 28, 2024). ### How are you running AnythingLLM? Docker (local) ### What happened? I use the **`sfr-embedding-mistral:q4_k_m`** embedding model and **`LanceDB`**. I uploaded a document to the workspace and clicked the "Save and Embed" button. The error occurred. [OllamaEmbedder] Embedding 2 chunks of text with sfr-embedding-mistral:q4_k_m. Inserting vectorized chunks into LanceDB collection. ``` addDocumentToNamespace LanceDBError: Append with different schema: original=Field(id=0, name=vector, type=fixed_size_list:float:384) Field(id=1, name=id, type=string) Field(id=2, name=url, type=string) Field(id=3, name=title, type=string) Field(id=4, name=docAuthor, type=string) Field(id=5, name=description, type=string) Field(id=6, name=docSource, type=string) Field(id=7, name=chunkSource, type=string) Field(id=8, name=published, type=string) Field(id=9, name=wordCount, type=double) Field(id=10, name=token_count_estimate, type=double) Field(id=11, name=text, type=string) new=Field(id=0, name=vector, type=fixed_size_list:float:4096) Field(id=1, name=id, type=string) Field(id=2, name=url, type=string) Field(id=3, name=title, type=string) Field(id=4, name=docAuthor, type=string) Field(id=5, name=description, type=string) Field(id=6, name=docSource, type=string) Field(id=7, name=chunkSource, type=string) Field(id=8, name=published, type=string) Field(id=9, name=wordCount, type=double) Field(id=10, name=token_count_estimate, type=double) Field(id=11, name=text, type=string) ``` As far as I know, the output of the sfr-embedding-mistral:q4_k_m model has dimension 4096, which cannot be changed. Can the data storage dimension of the LanceDB database be changed? Maybe changing the data storage dimension from 384 to 4096 will fix this? ### Are there known steps to reproduce? _No response_
deekerman 2026-02-28 04:52:08 -05:00
Author
Owner

@timothycarambat commented on GitHub (Mar 28, 2024):

You cannot create a workspace and embed documents with one embedder, unembed all documents, swap embedders, and re-embed into that same workspace. The lanceDB schema is not removed when the last document is un-embedded (maybe it should to prevent this issues).

Just make a new workspace and it will work 👍

Default embedder is 384, so that is likely what happened here

@timothycarambat commented on GitHub (Mar 28, 2024): You cannot create a workspace and embed documents with one embedder, unembed all documents, swap embedders, and re-embed into that same workspace. The lanceDB schema is not removed when the last document is un-embedded (maybe it should to prevent this issues). Just make a new workspace and it will work 👍 Default embedder is 384, so that is likely what happened here
Author
Owner

@majestichou commented on GitHub (Mar 28, 2024):

You cannot create a workspace and embed documents with one embedder, unembed all documents, swap embedders, and re-embed into that same workspace. The lanceDB schema is not removed when the last document is un-embedded (maybe it should to prevent this issues).

Just make a new workspace and it will work 👍

Default embedder is 384, so that is likely what happened here

YES!Just make a new workspace and it will work 👍 Thanks a lot! You did a great job!

@majestichou commented on GitHub (Mar 28, 2024): > You cannot create a workspace and embed documents with one embedder, unembed all documents, swap embedders, and re-embed into that same workspace. The lanceDB schema is not removed when the last document is un-embedded (maybe it should to prevent this issues). > > Just make a new workspace and it will work 👍 > > Default embedder is 384, so that is likely what happened here YES!Just make a new workspace and it will work 👍 Thanks a lot! You did a great job!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/anything-llm#609
No description provided.