Recommendation algorithm set up #6064

Open
opened 2026-02-22 12:24:27 -05:00 by deekerman · 3 comments
Owner

Originally created by @solidheron on GitHub (Apr 10, 2025).

Describe the problem to be solved

tl;dr: I'm proposing the addition of two elements to a video's .json value: Video_description_vector_history and Video_description_vector. The first element records the history of all descriptions submitted for a video, and the second represents what an instance decides to share from that description.

I've been thinking about recommendation algorithms for PeerTube and what that would entail. I'm currently in the research and brainstorming phase. One idea I've come up with is that viewers could run a recommendation algorithm locally on their own machines, using information from the API and possibly the video content itself. For now, I'm calling the description of a video—such as the category it belongs to—a Video_description_vector. This will be used to help shape a user vector, which will guide what videos are recommended. I haven't fully developed the algorithm yet, but I anticipate that in the future there will be a variety of algorithms, both local and instance-based.

All algorithms, however, will need a video descriptor. I'm starting to form my own standard for describing videos, but I recognize that someone else may eventually come up with a better standard. Therefore, I want to suggest adding two .json elements: Video_description_vector and Video_description_vector_history, as shown below.

Video_description_vector is a simple element that an instance can share upon request. Each sub-element represents a different standard for describing a video. In my example, I include my own proposed standard and an imaginary future standard to illustrate how multiple standards could coexist.

Video_description_vector_history records who submitted a categorical recommendation for a video to an instance. That’s why each entry includes the sub-elements "name" and "host" to indicate the source of the submission. The "uuid" identifies which video the recommendation applies to. Of course, an instance can choose whom to trust—or retroactively decide to disregard certain sources.

Below is a made-up example. It’s about the Cleveland Browns, an American football team, doing a fundraiser for charity. The isTrue element lists what categories the video belongs to, while the isFalse element lists what categories it does not belong to—subjectively, of course. Categories like “football” or “sports” are excluded because the Browns are not actually playing football or engaging in sports in the video.

{
"Video_description_vector": {
"recomended_standandard": {
"isTrue": ["Browns", "Charity", "Cleveland", "ALS","fund raiser"],
"isFalse": ["Sports", "football", "Cincinatti"]
},
"future_standand": {
"doesnt": {
"subarray": { "example": "no" },
"exist": ["Sports", "football", "Cincinatti"]
}
}
},
"Video_description_vector_history": [
{
"name": "vidchase",
"host": "videovortex.tv",
"submitted_date": "11/15/2020",
"uuid": "this and everything above can be removed if inside a video.json",
"recomended_standandard": {
"isTrue": ["Browns", "Charity", "Cleveland"],
"isFalse": ["Sports", "football", "Cincinatti"]
}
},
{
"name": "composite",
"host": "combined.instance",
"submitted_date": "4/15/2024",
"uuid": "this and everything above can be removed if inside a video.json",
"recomended_standandard": {
"isTrue": ["fund raiser", "Charity", "ALS"]
}
},
{
"name": "Troll",
"host": "wrong.info",
"submitted_date": "6/9/0420",
"uuid": "example",
"recomended_standand": {
"isTrue": ["sack", "ballz"]
}
},
{
"name": "GoodFaith_but_wrong",
"host": "other.instance",
"submitted_date": "1/14/2023",
"uuid": "example",
"recomended_standand": {
"isTrue": ["Browns", "NFL", "football"]
}
}
]
}

Describe the solution you would like

No response

Originally created by @solidheron on GitHub (Apr 10, 2025). ### Describe the problem to be solved tl;dr: I'm proposing the addition of two elements to a video's .json value: Video_description_vector_history and Video_description_vector. The first element records the history of all descriptions submitted for a video, and the second represents what an instance decides to share from that description. I've been thinking about recommendation algorithms for PeerTube and what that would entail. I'm currently in the research and brainstorming phase. One idea I've come up with is that viewers could run a recommendation algorithm locally on their own machines, using information from the API and possibly the video content itself. For now, I'm calling the description of a video—such as the category it belongs to—a Video_description_vector. This will be used to help shape a user vector, which will guide what videos are recommended. I haven't fully developed the algorithm yet, but I anticipate that in the future there will be a variety of algorithms, both local and instance-based. All algorithms, however, will need a video descriptor. I'm starting to form my own standard for describing videos, but I recognize that someone else may eventually come up with a better standard. Therefore, I want to suggest adding two .json elements: Video_description_vector and Video_description_vector_history, as shown below. Video_description_vector is a simple element that an instance can share upon request. Each sub-element represents a different standard for describing a video. In my example, I include my own proposed standard and an imaginary future standard to illustrate how multiple standards could coexist. Video_description_vector_history records who submitted a categorical recommendation for a video to an instance. That’s why each entry includes the sub-elements "name" and "host" to indicate the source of the submission. The "uuid" identifies which video the recommendation applies to. Of course, an instance can choose whom to trust—or retroactively decide to disregard certain sources. Below is a made-up example. It’s about the Cleveland Browns, an American football team, doing a fundraiser for charity. The isTrue element lists what categories the video belongs to, while the isFalse element lists what categories it does not belong to—subjectively, of course. Categories like “football” or “sports” are excluded because the Browns are not actually playing football or engaging in sports in the video. { "Video_description_vector": { "recomended_standandard": { "isTrue": ["Browns", "Charity", "Cleveland", "ALS","fund raiser"], "isFalse": ["Sports", "football", "Cincinatti"] }, "future_standand": { "doesnt": { "subarray": { "example": "no" }, "exist": ["Sports", "football", "Cincinatti"] } } }, "Video_description_vector_history": [ { "name": "vidchase", "host": "videovortex.tv", "submitted_date": "11/15/2020", "uuid": "this and everything above can be removed if inside a video.json", "recomended_standandard": { "isTrue": ["Browns", "Charity", "Cleveland"], "isFalse": ["Sports", "football", "Cincinatti"] } }, { "name": "composite", "host": "combined.instance", "submitted_date": "4/15/2024", "uuid": "this and everything above can be removed if inside a video.json", "recomended_standandard": { "isTrue": ["fund raiser", "Charity", "ALS"] } }, { "name": "Troll", "host": "wrong.info", "submitted_date": "6/9/0420", "uuid": "example", "recomended_standand": { "isTrue": ["sack", "ballz"] } }, { "name": "GoodFaith_but_wrong", "host": "other.instance", "submitted_date": "1/14/2023", "uuid": "example", "recomended_standand": { "isTrue": ["Browns", "NFL", "football"] } } ] } ### Describe the solution you would like _No response_
Author
Owner

@MadMan247 commented on GitHub (May 7, 2025):

Would this require this json file to be added to peertube itself, or is this something that could be implemented into a plugin?

@MadMan247 commented on GitHub (May 7, 2025): Would this require this json file to be added to peertube itself, or is this something that could be implemented into a plugin?
Author
Owner

@solidheron commented on GitHub (May 7, 2025):

Would this require this json file to be added to peertube itself, or is this something that could be implemented into a plugin?

i thought about it it can be any number of ways. it can be one instance that does video recommendation vector and each peertube user can simply make a requests to that instance or every peertube instance can have this feature.

in my browser extension i just put the vector inside json entry that exist for each peertube video. (just keeping it simple)

this is more proposal/whitepaper for the basis of an easy-to-run ethical algorithm for peertube/fediverse

@solidheron commented on GitHub (May 7, 2025): > Would this require this json file to be added to peertube itself, or is this something that could be implemented into a plugin? i thought about it it can be any number of ways. it can be one instance that does video recommendation vector and each peertube user can simply make a requests to that instance or every peertube instance can have this feature. in my browser extension i just put the vector inside json entry that exist for each peertube video. (just keeping it simple) this is more proposal/whitepaper for the basis of an easy-to-run ethical algorithm for peertube/fediverse
Author
Owner

@coding-crying commented on GitHub (Nov 7, 2025):

This tag-based approach seems pretty practical. I've been experimenting with something similar for ActivityPub and your metadata proposal actually sidesteps the hardest part - you don't need to deal with processing video files directly.

What I've been trying is converting the tag vectors to text ("This video is about Browns, Charity, Cleveland...") and running them through a small embedding model (all-MiniLM-L6-v2, ~80MB). Combined that with some per-user personalization LoRAs and basic collaborative filtering, it might work decently for recommendations without needing much infrastructure. At least in theory - I haven't tested it at any real scale yet.

Main unknowns are how well the tag-based approach captures enough signal compared to actual video content, and whether the collaborative filtering works across small instance populations. Cold start is definitely an issue too. Your vector history with source tracking could help with tag quality though.

If you want to experiment with it, I've got some prototype code for Mastodon that could probably be adapted. No idea if it'll work well for video recommendations specifically, but might be worth testing.

@coding-crying commented on GitHub (Nov 7, 2025): This tag-based approach seems pretty practical. I've been experimenting with something similar for ActivityPub and your metadata proposal actually sidesteps the hardest part - you don't need to deal with processing video files directly. What I've been trying is converting the tag vectors to text ("This video is about Browns, Charity, Cleveland...") and running them through a small embedding model (all-MiniLM-L6-v2, ~80MB). Combined that with some per-user personalization LoRAs and basic collaborative filtering, it might work decently for recommendations without needing much infrastructure. At least in theory - I haven't tested it at any real scale yet. Main unknowns are how well the tag-based approach captures enough signal compared to actual video content, and whether the collaborative filtering works across small instance populations. Cold start is definitely an issue too. Your vector history with source tracking could help with tag quality though. If you want to experiment with it, I've got some prototype code for Mastodon that could probably be adapted. No idea if it'll work well for video recommendations specifically, but might be worth testing.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/PeerTube#6064
No description provided.