mirror of
https://github.com/Mintplex-Labs/anything-llm.git
synced 2026-03-02 22:57:05 -05:00
[BUG/FEAT]: Delete documents under “My Documents” after changing the embedding model #1777
Labels
No labels
Desktop
Docker
Integration Request
Integration Request
OS: Linux
OS: Mobile
OS: Windows
UI/UX
blocked
bug
bug
core-team-only
documentation
duplicate
embed-widget
enhancement
feature request
github_actions
good first issue
investigating
needs info / can't replicate
possible bug
question
stage: specifications
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/anything-llm#1777
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @joomgallerytestit on GitHub (Dec 1, 2024).
Originally assigned to: @shatfield4 on GitHub.
Hi,
I don't understand how the workspaces and the functions provided there are implemented.
Question #1:
On the left, under “My Documents”, I can create folders and assign names to them.
In my case, I created the folders “Testfolder” and “Testfolder2” in addition to the already existing “custom-documents” folder. However, despite multiple attempts, I have not been able to upload the documents I uploaded via the “Click to upload or drag and drop” area into these subfolders. They always end up in the “custom-documents” directory and cannot be moved from there to one of the other two folders, “Testfolder” and “Testfolder2”, using drag and drop.
Where can I find more detailed instructions that will help me solve the above problem so that I can create folders with appropriate names under “My Documents” to structure the source documents and upload the desired source documents to them?
Question #2:
The actual workspace is on the right. To get one or more of the documents from “My Documents” there, the relevant document from “My Documents” must be selected and then “Move to Workspace” clicked. Vectorization can then be started by clicking on “Save and Embed”.
Before using it as a local RAG system, the appropriate combination of LLM model (LM Studio) and embedding model (Ollama) must be found, which means that a large number of tests have to be carried out. This also includes testing different embedding models (via Ollama).
Problem:
If you test a new embedding model, you have to reset the existing vector database with “Reset Vector Database” and repeat the embedding process with the new embedding model. In my opinion, it should be sufficient to delete all documents in the workspace on the right, then reselect the relevant documents on the left under “My Documents”, then click on “Move to Workspace” so that the documents move to the workspace on the right again to be vectorized by the new embedding model (to be tested) (Save and Embed).
Exactly this procedure, which is in line with expectations, does NOT work!
Because afterwards, a glance at “Max Context Snippets” and “Vector Count” in the relevant workspace shows that the values have not changed, although I deliberately always change the value of “Text Chunk Size” beforehand to test a different embedding model.
Only after I delete the relevant documents on the left under “My Documents”, upload them again, then move them to the right using “Move to workspace” and then click on “Save and Embed” is the vector database generated and saved by the new embedding model.
As I understand it, this should not be the case, but should work as follows: as soon as I delete the documents that have already been embedded on the right-hand side of the workspace, then click on these documents again on the left under “My Documents”, move them into the workspace and click on “save and embed”, this process should be carried out using the new embedding model and the data should be vectorized accordingly.
Thanks in advance and kind regards
Joomgallerytestit
@joomgallerytestit commented on GitHub (Dec 6, 2024):
#BUG
@joomgallerytestit commented on GitHub (Dec 11, 2024):
It is still not clear why everything should be deleted under ‘My Documents’ if it has already been done on the right-hand side of the WORKSPACE. The documents in ‘My Documents’ only become part of the workspace after ‘Move to workspace’ and only then they are vectorised by ‘save and embed ’. In addition, as described, I am unable to define document hierarchies in connection with different folders in ‘My Documents’.
Kind regards
@timothycarambat commented on GitHub (Dec 11, 2024):
The documents should not be deleted, but the cached embeddings and the vectors stored in the tables should be as there is no guarantee the embedding model dimensions are the same, which if they are not will cause all upserts and similarity search to fail due to dimension mismatch.
@joomgallerytestit commented on GitHub (Dec 11, 2024):
Hi,
I have explained several times that on my Windows 10 system in AnythingLLM, after changing the embed model provider (via Ollama), I have to delete the documents under 'My Documents' in order for the vector database to be filled with the new data of a newly tested embedding model.
Before doing this, I reset the vector database. In my opinion, you shouldn't have to delete anything on the LEFT side of "My Documents", but only on the RIGHT side of the workspace!
If I delete everything on the right side after I've integrated a new embedding model via Ollama and reset the vector database, and then start 'Save and Embed', the vector database is obviously filled with the old data again.
Only when I delete the documents in 'My Documents' on the left, upload them again, move them to the workspace and click 'Save and Embed' does the vector database get filled by the new embed server model.
Also, it is still a mystery to me how, for example, I can create three folders with different documents in 'My Documents'. Drag and drop or similar methods do not work. Unfortunately, the manual is very sparse on this point.
Kind regards
joomgallerytestit
@timothycarambat commented on GitHub (Dec 11, 2024):
We are saying the same thing
@joomgallerytestit commented on GitHub (Dec 14, 2024):
Hi,
thank you for looking into the problem I described. I have one more request: It is still a mystery to me how I can, for example, create three folders with different documents in "My Documents". Drag and drop or similar methods do not work. Therefore, I am currently still unable to create a hierarchically organised workspace.
Kind regards
Joomgallerytestit
P. S.: I GOT IT NOW! :-))