Workspaces include "non-embedded" files #56

Closed
opened 2026-02-28 04:23:26 -05:00 by deekerman · 8 comments
Owner

Originally created by @danglingptr0x0 on GitHub (Jun 19, 2023).

I have noticed that one of my workspaces cites files that are not embedded in it. I double-checked its embedded files, restarted both the server and the UI, and retried, but it happened anyway. The files in question have been vectorized, but they are not in the workspace, so maybe it somehow reads all vectorized, but not necessarily embedded files?

Originally created by @danglingptr0x0 on GitHub (Jun 19, 2023). I have noticed that one of my workspaces cites files that are not embedded in it. I double-checked its embedded files, restarted both the server and the UI, and retried, but it happened anyway. The files in question have been vectorized, but they are not in the workspace, so maybe it somehow reads all vectorized, but not necessarily embedded files?
deekerman 2026-02-28 04:23:26 -05:00
  • closed this issue
  • added the
    question
    label
Author
Owner

@timothycarambat commented on GitHub (Jun 19, 2023):

The embeddings are separated by namespace or collections per each workspace. So if using Pinecone for example it will only do a search across documents/embeddings in that specific namespace.

Are you saying you have embedding documents that are appearing in other workspaces?

@timothycarambat commented on GitHub (Jun 19, 2023): The embeddings are separated by `namespace` or `collections` per each workspace. So if using Pinecone for example it will only do a search across documents/embeddings in that specific namespace. Are you saying you have embedding documents that are appearing in other workspaces?
Author
Owner

@danglingptr0x0 commented on GitHub (Jun 20, 2023):

Are you saying you have embedding documents that are appearing in other workspaces?

Precisely, yes.

@danglingptr0x0 commented on GitHub (Jun 20, 2023): > Are you saying you have embedding documents that are appearing in other workspaces? Precisely, yes.
Author
Owner

@timothycarambat commented on GitHub (Jun 26, 2023):

When this problem occurs is it referencing source documents in the reply? It may be returning general information as if you just asked it a question in ChatGPT.

It must return the source documents it got information from so that would determine if that if it is using information from other workspaces.

@timothycarambat commented on GitHub (Jun 26, 2023): When this problem occurs is it referencing source documents in the reply? It may be returning general information as if you just asked it a question in ChatGPT. It **must** return the source documents it got information from so that would determine if that if it is using information from other workspaces.
Author
Owner

@danglingptr0x0 commented on GitHub (Jun 27, 2023):

When this problem occurs is it referencing source documents in the reply? It may be returning general information as if you just asked it a question in ChatGPT.

It must return the source documents it got information from so that would determine if that if it is using information from other workspaces.

@timothycarambat It's referencing docs that are not embedded in the given workspace.

@danglingptr0x0 commented on GitHub (Jun 27, 2023): > When this problem occurs is it referencing source documents in the reply? It may be returning general information as if you just asked it a question in ChatGPT. > > It **must** return the source documents it got information from so that would determine if that if it is using information from other workspaces. @timothycarambat It's referencing docs that *are not* embedded in the given workspace.
Author
Owner

@timothycarambat commented on GitHub (Jun 28, 2023):

When this problem occurs is it referencing source documents in the reply? It may be returning general information as if you just asked it a question in ChatGPT.
It must return the source documents it got information from so that would determine if that if it is using information from other workspaces.

@timothycarambat It's referencing docs that are not embedded in the given workspace.

Then even doubly so, which documents is it referring to that you are sure are definitely not embedded? It may be possible the document for sure is embedded by the vector db but somehow the db does not have an associating document record in the DB - which would be weird but not impossible.

Collections are namespace and should not be colliding and are separated logically that way on the vector DB level

@timothycarambat commented on GitHub (Jun 28, 2023): > > When this problem occurs is it referencing source documents in the reply? It may be returning general information as if you just asked it a question in ChatGPT. > > It **must** return the source documents it got information from so that would determine if that if it is using information from other workspaces. > > @timothycarambat It's referencing docs that _are not_ embedded in the given workspace. Then even doubly so, which documents is it referring to that you are sure are definitely not embedded? It may be possible the document for sure is embedded by the vector db but somehow the db does not have an associating document record in the DB - which would be weird but not impossible. Collections are namespace and should not be colliding and are separated logically that way on the vector DB level
Author
Owner

@danglingptr0x0 commented on GitHub (Jul 5, 2023):

When this problem occurs is it referencing source documents in the reply? It may be returning general information as if you just asked it a question in ChatGPT.
It must return the source documents it got information from so that would determine if that if it is using information from other workspaces.

@timothycarambat It's referencing docs that are not embedded in the given workspace.

Then even doubly so, which documents is it referring to that you are sure are definitely not embedded? It may be possible the document for sure is embedded by the vector db but somehow the db does not have an associating document record in the DB - which would be weird but not impossible.

Collections are namespace and should not be colliding and are separated logically that way on the vector DB level

@timothycarambat Hm, they are vectorized, but not embedded in the given workspace, according to the doc selector modal. Yet, they are being referenced in the chat. So what you said would, I guess, be the opposite of what I am seeing. Docs are being referenced but not even selected for the given workspace in the first place. I can screencap what I'm seeing if it persists after updating. Haven't interacted with it for a week or two.

@danglingptr0x0 commented on GitHub (Jul 5, 2023): > > > When this problem occurs is it referencing source documents in the reply? It may be returning general information as if you just asked it a question in ChatGPT. > > > It **must** return the source documents it got information from so that would determine if that if it is using information from other workspaces. > > > > > > @timothycarambat It's referencing docs that _are not_ embedded in the given workspace. > > Then even doubly so, which documents is it referring to that you are sure are definitely not embedded? It may be possible the document for sure is embedded by the vector db but somehow the db does not have an associating document record in the DB - which would be weird but not impossible. > > Collections are namespace and should not be colliding and are separated logically that way on the vector DB level @timothycarambat Hm, they are vectorized, but not embedded in the given workspace, according to the doc selector modal. Yet, they are being referenced in the chat. So what you said would, I guess, be the opposite of what I am seeing. Docs are being referenced but not even selected for the given workspace in the first place. I can screencap what I'm seeing if it persists after updating. Haven't interacted with it for a week or two.
Author
Owner

@timothycarambat commented on GitHub (Nov 14, 2023):

Marking this issue as stale, we have revised permissions and other document-related features so please try to replicate again

@timothycarambat commented on GitHub (Nov 14, 2023): Marking this issue as stale, we have revised permissions and other document-related features so please try to replicate again
Author
Owner

@danglingptr0x0 commented on GitHub (Nov 14, 2023):

Marking this issue as stale, we have revised permissions and other document-related features so please try to replicate again

Will do.

@danglingptr0x0 commented on GitHub (Nov 14, 2023): > Marking this issue as stale, we have revised permissions and other document-related features so please try to replicate again Will do.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/anything-llm#56
No description provided.