[BUG/FEAT]: Delete documents under “My Documents” after changing the embedding model #1777

Closed
opened 2026-02-28 05:40:29 -05:00 by deekerman · 6 comments
Owner

Originally created by @joomgallerytestit on GitHub (Dec 1, 2024).

Originally assigned to: @shatfield4 on GitHub.

Hi,

I don't understand how the workspaces and the functions provided there are implemented.

Question #1:
On the left, under “My Documents”, I can create folders and assign names to them.
In my case, I created the folders “Testfolder” and “Testfolder2” in addition to the already existing “custom-documents” folder. However, despite multiple attempts, I have not been able to upload the documents I uploaded via the “Click to upload or drag and drop” area into these subfolders. They always end up in the “custom-documents” directory and cannot be moved from there to one of the other two folders, “Testfolder” and “Testfolder2”, using drag and drop.
Where can I find more detailed instructions that will help me solve the above problem so that I can create folders with appropriate names under “My Documents” to structure the source documents and upload the desired source documents to them?

Question #2:
The actual workspace is on the right. To get one or more of the documents from “My Documents” there, the relevant document from “My Documents” must be selected and then “Move to Workspace” clicked. Vectorization can then be started by clicking on “Save and Embed”.

Before using it as a local RAG system, the appropriate combination of LLM model (LM Studio) and embedding model (Ollama) must be found, which means that a large number of tests have to be carried out. This also includes testing different embedding models (via Ollama).

Problem:
If you test a new embedding model, you have to reset the existing vector database with “Reset Vector Database” and repeat the embedding process with the new embedding model. In my opinion, it should be sufficient to delete all documents in the workspace on the right, then reselect the relevant documents on the left under “My Documents”, then click on “Move to Workspace” so that the documents move to the workspace on the right again to be vectorized by the new embedding model (to be tested) (Save and Embed).
Exactly this procedure, which is in line with expectations, does NOT work!

Because afterwards, a glance at “Max Context Snippets” and “Vector Count” in the relevant workspace shows that the values have not changed, although I deliberately always change the value of “Text Chunk Size” beforehand to test a different embedding model.
Only after I delete the relevant documents on the left under “My Documents”, upload them again, then move them to the right using “Move to workspace” and then click on “Save and Embed” is the vector database generated and saved by the new embedding model.

As I understand it, this should not be the case, but should work as follows: as soon as I delete the documents that have already been embedded on the right-hand side of the workspace, then click on these documents again on the left under “My Documents”, move them into the workspace and click on “save and embed”, this process should be carried out using the new embedding model and the data should be vectorized accordingly.

Thanks in advance and kind regards
Joomgallerytestit

anythingllm-workspace

Originally created by @joomgallerytestit on GitHub (Dec 1, 2024). Originally assigned to: @shatfield4 on GitHub. Hi, I don't understand how the workspaces and the functions provided there are implemented. **Question #1:** On the left, under “My Documents”, I can create folders and assign names to them. In my case, I created the folders “Testfolder” and “Testfolder2” in addition to the already existing “custom-documents” folder. However, despite multiple attempts, I have not been able to upload the documents I uploaded via the “Click to upload or drag and drop” area into these subfolders. They always end up in the “custom-documents” directory and cannot be moved from there to one of the other two folders, “Testfolder” and “Testfolder2”, using drag and drop. Where can I find more detailed instructions that will help me solve the above problem so that I can create folders with appropriate names under “My Documents” to structure the source documents and upload the desired source documents to them? **Question #2:** The actual workspace is on the right. To get one or more of the documents from “My Documents” there, the relevant document from “My Documents” must be selected and then “Move to Workspace” clicked. Vectorization can then be started by clicking on “Save and Embed”. Before using it as a local RAG system, the appropriate combination of LLM model (LM Studio) and embedding model (Ollama) must be found, which means that a large number of tests have to be carried out. This also includes testing different embedding models (via Ollama). **Problem:** If you test a new embedding model, you have to reset the existing vector database with “Reset Vector Database” and repeat the embedding process with the new embedding model. In my opinion, it should be sufficient to delete all documents in the workspace on the right, then reselect the relevant documents on the left under “My Documents”, then click on “Move to Workspace” so that the documents move to the workspace on the right again to be vectorized by the new embedding model (to be tested) (Save and Embed). Exactly this procedure, which is in line with expectations, does NOT work! Because afterwards, a glance at “Max Context Snippets” and “Vector Count” in the relevant workspace shows that the values have not changed, although I deliberately always change the value of “Text Chunk Size” beforehand to test a different embedding model. Only after I delete the relevant documents on the left under “My Documents”, upload them again, then move them to the right using “Move to workspace” and then click on “Save and Embed” is the vector database generated and saved by the new embedding model. As I understand it, this should not be the case, but should work as follows: as soon as I delete the documents that have already been embedded on the right-hand side of the workspace, then click on these documents again on the left under “My Documents”, move them into the workspace and click on “save and embed”, this process should be carried out using the new embedding model and the data should be vectorized accordingly. Thanks in advance and kind regards Joomgallerytestit ![anythingllm-workspace](https://github.com/user-attachments/assets/889bc1eb-e7df-4485-beb6-856b20dc04ef)
Author
Owner

@joomgallerytestit commented on GitHub (Dec 6, 2024):

#BUG

@joomgallerytestit commented on GitHub (Dec 6, 2024): #BUG
Author
Owner

@joomgallerytestit commented on GitHub (Dec 11, 2024):

It is still not clear why everything should be deleted under ‘My Documents’ if it has already been done on the right-hand side of the WORKSPACE. The documents in ‘My Documents’ only become part of the workspace after ‘Move to workspace’ and only then they are vectorised by ‘save and embed ’. In addition, as described, I am unable to define document hierarchies in connection with different folders in ‘My Documents’.

Kind regards

@joomgallerytestit commented on GitHub (Dec 11, 2024): It is still not clear why everything should be deleted under ‘My Documents’ if it has already been done on the right-hand side of the WORKSPACE. The documents in ‘My Documents’ only become part of the workspace after ‘Move to workspace’ and only then they are vectorised by ‘save and embed ’. In addition, as described, I am unable to define document hierarchies in connection with different folders in ‘My Documents’. Kind regards
Author
Owner

@timothycarambat commented on GitHub (Dec 11, 2024):

The documents should not be deleted, but the cached embeddings and the vectors stored in the tables should be as there is no guarantee the embedding model dimensions are the same, which if they are not will cause all upserts and similarity search to fail due to dimension mismatch.

@timothycarambat commented on GitHub (Dec 11, 2024): The documents should not be deleted, but the cached embeddings and the vectors stored in the tables _should be_ as there is no guarantee the embedding model dimensions are the same, which if they are not will cause all upserts and similarity search to fail due to dimension mismatch.
Author
Owner

@joomgallerytestit commented on GitHub (Dec 11, 2024):

Hi,

I have explained several times that on my Windows 10 system in AnythingLLM, after changing the embed model provider (via Ollama), I have to delete the documents under 'My Documents' in order for the vector database to be filled with the new data of a newly tested embedding model.

Before doing this, I reset the vector database. In my opinion, you shouldn't have to delete anything on the LEFT side of "My Documents", but only on the RIGHT side of the workspace!

If I delete everything on the right side after I've integrated a new embedding model via Ollama and reset the vector database, and then start 'Save and Embed', the vector database is obviously filled with the old data again.

Only when I delete the documents in 'My Documents' on the left, upload them again, move them to the workspace and click 'Save and Embed' does the vector database get filled by the new embed server model.

Also, it is still a mystery to me how, for example, I can create three folders with different documents in 'My Documents'. Drag and drop or similar methods do not work. Unfortunately, the manual is very sparse on this point.

Kind regards
joomgallerytestit

@joomgallerytestit commented on GitHub (Dec 11, 2024): Hi, I have explained several times that on my Windows 10 system in AnythingLLM, after changing the embed model provider (via Ollama), I have to delete the documents under 'My Documents' in order for the vector database to be filled with the new data of a newly tested embedding model. Before doing this, I reset the vector database. In my opinion, you shouldn't have to delete anything on the LEFT side of "My Documents", but only on the RIGHT side of the workspace! If I delete everything on the right side after I've integrated a new embedding model via Ollama and reset the vector database, and then start 'Save and Embed', the vector database is obviously filled with the **old** data again. Only when I delete the documents in 'My Documents' on the left, upload them again, move them to the workspace and click 'Save and Embed' does the vector database get filled by the new embed server model. Also, it is still a mystery to me how, for example, I can create three folders with different documents in 'My Documents'. Drag and drop or similar methods do not work. Unfortunately, the manual is very sparse on this point. Kind regards joomgallerytestit
Author
Owner

@timothycarambat commented on GitHub (Dec 11, 2024):

We are saying the same thing

@timothycarambat commented on GitHub (Dec 11, 2024): We are saying the same thing
Author
Owner

@joomgallerytestit commented on GitHub (Dec 14, 2024):

Hi,

thank you for looking into the problem I described. I have one more request: It is still a mystery to me how I can, for example, create three folders with different documents in "My Documents". Drag and drop or similar methods do not work. Therefore, I am currently still unable to create a hierarchically organised workspace.

Kind regards
Joomgallerytestit

P. S.: I GOT IT NOW! :-))

@joomgallerytestit commented on GitHub (Dec 14, 2024): Hi, thank you for looking into the problem I described. I have one more request: It is still a mystery to me how I can, for example, create three folders with different documents in "My Documents". Drag and drop or similar methods do not work. Therefore, I am currently still unable to create a hierarchically organised workspace. Kind regards Joomgallerytestit **P. S.: I GOT IT NOW!** :-))
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/anything-llm#1777
No description provided.