[YouTube] youtube-dl does not find videos listed on user/channel "shorts" subpage #25559

Closed
opened 2026-02-21 13:51:46 -05:00 by deekerman · 9 comments
Owner

Originally created by @GigoMego on GitHub (Nov 7, 2022).

Originally assigned to: @zhangeric-15 on GitHub.

Hi

youtube-dl won't parse videos on user/channel "shorts" subpage

https://www.youtube.com/c/Sonyakisa8TT/shorts

if amout of videos is big then subpages are downloaded but output is "...Shorts: Downloading 0 videos"

Can it be fixed?

Originally created by @GigoMego on GitHub (Nov 7, 2022). Originally assigned to: @zhangeric-15 on GitHub. Hi youtube-dl won't parse videos on user/channel "shorts" subpage https://www.youtube.com/c/Sonyakisa8TT/shorts if amout of videos is big then subpages are downloaded but output is "...Shorts: Downloading 0 videos" Can it be fixed?
Author
Owner

@dirkf commented on GitHub (Nov 7, 2022):

Reproducible in git master.

The Shorts tab isn't currently extracted but as its structure appears to be very similar to the Videos tab, there should be a simple patch, like this:

--- old/youtube_dl/extractor/youtube.py
+++ new/youtube_dl/extractor/youtube.py
@@ -2680,7 +2680,10 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
 
     def _rich_grid_entries(self, contents):
         for content in contents:
-            video_renderer = try_get(content, lambda x: x['richItemRenderer']['content']['videoRenderer'], dict)
+            video_renderer = try_get(content,
+                (lambda x: x['richItemRenderer']['content']['videoRenderer'],
+                 lambda x: x['richItemRenderer']['content']['reelItemRenderer']),
+                dict)
             if video_renderer:
                 entry = self._video_entry(video_renderer)
                 if entry:

Then:

$ python -m youtube_dl --flat-playlist 'https://www.youtube.com/c/Sonyakisa8TT/shorts'
[youtube:tab] Sonyakisa8TT: Downloading webpage
[download] Downloading playlist: Sonyakisa8 TT - Shorts
[youtube:tab] Downloading page 1
[youtube:tab] Downloading page 2
[youtube:tab] Downloading page 3
[youtube:tab] playlist Sonyakisa8 TT - Shorts: Downloading 151 videos
[download] Downloading video 1 of 151
[download] Downloading video 2 of 151
[download] Downloading video 3 of 151
[download] Downloading video 4 of 151
[download] Downloading video 5 of 151
[download] Downloading video 6 of 151
...
[download] Downloading video 150 of 151
[download] Downloading video 151 of 151
[download] Finished downloading playlist: Sonyakisa8 TT - Shorts
$
@dirkf commented on GitHub (Nov 7, 2022): Reproducible in git master. The Shorts tab isn't currently extracted but as its structure appears to be very similar to the Videos tab, there should be a simple patch, like this: ```diff --- old/youtube_dl/extractor/youtube.py +++ new/youtube_dl/extractor/youtube.py @@ -2680,7 +2680,10 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor): def _rich_grid_entries(self, contents): for content in contents: - video_renderer = try_get(content, lambda x: x['richItemRenderer']['content']['videoRenderer'], dict) + video_renderer = try_get(content, + (lambda x: x['richItemRenderer']['content']['videoRenderer'], + lambda x: x['richItemRenderer']['content']['reelItemRenderer']), + dict) if video_renderer: entry = self._video_entry(video_renderer) if entry: ``` Then: ```shellsession $ python -m youtube_dl --flat-playlist 'https://www.youtube.com/c/Sonyakisa8TT/shorts' [youtube:tab] Sonyakisa8TT: Downloading webpage [download] Downloading playlist: Sonyakisa8 TT - Shorts [youtube:tab] Downloading page 1 [youtube:tab] Downloading page 2 [youtube:tab] Downloading page 3 [youtube:tab] playlist Sonyakisa8 TT - Shorts: Downloading 151 videos [download] Downloading video 1 of 151 [download] Downloading video 2 of 151 [download] Downloading video 3 of 151 [download] Downloading video 4 of 151 [download] Downloading video 5 of 151 [download] Downloading video 6 of 151 ... [download] Downloading video 150 of 151 [download] Downloading video 151 of 151 [download] Finished downloading playlist: Sonyakisa8 TT - Shorts $ ```
Author
Owner

@zhangeric-15 commented on GitHub (Nov 7, 2022):

Hello, I am currently a student at a university. For my software engineering class, our final project is to contribute to an open source project. My classmate and I are interested in attempting to fix this bug, but wanted to reach out to any maintainers of this repo to see if there's any additional information that could help us get started. We will be spending a couple weeks on this project and would appreciate any advice on how we can help.

Thank you!

@zhangeric-15 commented on GitHub (Nov 7, 2022): Hello, I am currently a student at a university. For my software engineering class, our final project is to contribute to an open source project. My classmate and I are interested in attempting to fix this bug, but wanted to reach out to any maintainers of this repo to see if there's any additional information that could help us get started. We will be spending a couple weeks on this project and would appreciate any advice on how we can help. Thank you!
Author
Owner

@dirkf commented on GitHub (Nov 7, 2022):

As you can see I've already provided a fix that works in this particular case. But that's just the start of the work that leads to a solution being committed to the repository.

If you're still interested, this is what I'd suggest

  • familiarise yourselves with the program by reading the manual, especially the development section, trying various command options, and studying the source repository
  • as suggested in the documentation, fork the repo, clone the forked repo to your development environment and start a branch for the fix
  • review my patch, check it against the more sophisticated yt-dlp fork (which already handles this correctly) to see if it could be improved (eg, it looks like we could get an interim title along with the video_id of each Short -- maybe that would be possible for Videos too?)
  • create a new version of youtube_dl/extractor/youtube.py in your patch branch
  • test it with some further real-life examples
  • create a test-case for the fix in the _TESTS list of YoutubeTabIE and check it locally
  • once you're happy with the patch branch, push it to your GH repo and create a Pull Request against the master branch of the yt-dl repo: check the result of your test in the integration workflow that the project runs on each commit.

You should be aware that (for obvious reasons) the YouTube extractor module is the most complex of the yt-dl extractors, about twice the size of the next biggest site-specific extractor. The yt-dlp extractor is twice as big again!

Of course there are other open issues for which no solution yet exists that you could try to solve but what I've suggested is probably a good dose of software engineering.

@dirkf commented on GitHub (Nov 7, 2022): As you can see I've already provided a fix that works in this particular case. But that's just the start of the work that leads to a solution being committed to the repository. If you're still interested, this is what I'd suggest * familiarise yourselves with the program by reading the manual, especially the development section, trying various command options, and studying the source repository * as suggested in the documentation, fork the repo, clone the forked repo to your development environment and start a branch for the fix * review my patch, check it against the more sophisticated yt-dlp fork (which already handles this correctly) to see if it could be improved (eg, it looks like we could get an interim title along with the video_id of each Short -- maybe that would be possible for Videos too?) * create a new version of `youtube_dl/extractor/youtube.py` in your patch branch * test it with some further real-life examples * create a test-case for the fix in the `_TESTS` list of `YoutubeTabIE` and check it locally * once you're happy with the patch branch, push it to your GH repo and create a Pull Request against the master branch of the yt-dl repo: check the result of your test in the integration workflow that the project runs on each commit. You should be aware that (for obvious reasons) the YouTube extractor module is the most complex of the yt-dl extractors, about twice the size of the next biggest site-specific extractor. The yt-dlp extractor is twice as big again! Of course there are other open issues for which no solution yet exists that you could try to solve but what I've suggested is probably a good dose of software engineering.
Author
Owner

@zhangeric-15 commented on GitHub (Nov 7, 2022):

Thank you so much for your detailed response and advice! I double-checked our project requirement, and unfortunately, we have to find a bug/feature that hasn't been solved yet, so it looks like we can not work on this bug. Do you have any suggestions on other open issue reports we could help out with?

Edit: Actually hold on, let me confirm with my course staff
Edit #2: We can take this bug!

@zhangeric-15 commented on GitHub (Nov 7, 2022): Thank you so much for your detailed response and advice! I double-checked our project requirement, and unfortunately, we have to find a bug/feature that hasn't been solved yet, so it looks like we can not work on this bug. Do you have any suggestions on other open issue reports we could help out with? Edit: Actually hold on, let me confirm with my course staff Edit #2: We can take this bug!
Author
Owner

@zhangeric-15 commented on GitHub (Nov 28, 2022):

@dirkf
do you think you can clarify what you mean by this "(eg, it looks like we could get an interim title along with the video_id of each Short -- maybe that would be possible for Videos too?)". I thought the extractor for youtube videos are already able to retrieve a video's id and title, unless I'm not fully understanding what you mean. Thanks again for your help!

@zhangeric-15 commented on GitHub (Nov 28, 2022): @dirkf do you think you can clarify what you mean by this **"(eg, it looks like we could get an interim title along with the video_id of each Short -- maybe that would be possible for Videos too?)"**. I thought the extractor for youtube videos are already able to retrieve a video's id and title, unless I'm not fully understanding what you mean. Thanks again for your help!
Author
Owner

@dirkf commented on GitHub (Nov 28, 2022):

The extraction of videos in a playlist has two steps:

  1. extract playlist as a list (-ish) of items
  2. extract each video in the playlist.

My suggestion was that the playlist data would allow a title to be proposed in the list of playlist items, whereas the final title isn't found until step 2.

Then the output of --flat-playlist --dump-json would include a title that wouldn't otherwise be available.

@dirkf commented on GitHub (Nov 28, 2022): The extraction of videos in a playlist has two steps: 1. extract playlist as a list (-ish) of items 2. extract each video in the playlist. My suggestion was that the playlist data would allow a title to be proposed in the list of playlist items, whereas the final title isn't found until step 2. Then the output of `--flat-playlist --dump-json` would include a title that wouldn't otherwise be available.
Author
Owner

@zhangeric-15 commented on GitHub (Nov 30, 2022):

Ok, I think that makes a little more sense. Thanks!

@zhangeric-15 commented on GitHub (Nov 30, 2022): Ok, I think that makes a little more sense. Thanks!
Author
Owner

@zyrup commented on GitHub (Dec 13, 2022):

Hi, I wanted to ask what the current status on this bug is. Two weeks have passed now, has any commit happened to any branch that could be used?

@zyrup commented on GitHub (Dec 13, 2022): Hi, I wanted to ask what the current status on this bug is. Two weeks have passed now, has any commit happened to any branch that could be used?
Author
Owner

@zhangeric-15 commented on GitHub (Dec 13, 2022):

The fix is currently in the process of a pull request here: https://github.com/ytdl-org/youtube-dl/pull/31409
I'm not sure when they are planning to merge though.

@zhangeric-15 commented on GitHub (Dec 13, 2022): The fix is currently in the process of a pull request here: https://github.com/ytdl-org/youtube-dl/pull/31409 I'm not sure when they are planning to merge though.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/youtube-dl-ytdl-org#25559
No description provided.