Please support dzen.ru quickly #26754

Open
opened 2026-02-21 14:28:11 -05:00 by deekerman · 24 comments
Owner

Originally created by @aflamrip on GitHub (Dec 2, 2023).

Originally assigned to: @Revisto on GitHub.

Checklist

  • I'm reporting a new site support request
  • I've verified that I'm running youtube-dl version 2021.12.17
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that none of provided URLs violate any copyrights
  • I've searched the bugtracker for similar site support requests including closed ones

Example URLs

Description

It is a very popular Russian site and needed to be supported

Originally created by @aflamrip on GitHub (Dec 2, 2023). Originally assigned to: @Revisto on GitHub. <!-- ###################################################################### WARNING! IGNORING THE FOLLOWING TEMPLATE WILL RESULT IN ISSUE CLOSED AS INCOMPLETE ###################################################################### --> ## Checklist <!-- Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.12.17. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights. - Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Finally, put x into all relevant boxes (like this [x]) --> - [x] I'm reporting a new site support request - [x] I've verified that I'm running youtube-dl version **2021.12.17** - [x] I've checked that all provided URLs are alive and playable in a browser - [x] I've checked that none of provided URLs violate any copyrights - [x] I've searched the bugtracker for similar site support requests including closed ones ## Example URLs <!-- Provide all kinds of example URLs support for which should be included. Replace following example URLs by yours. --> - Single video: https://dzen.ru/video/watch/6471e50e06863726828b435c - Single video: https://dzen.ru/embed/vnVEaPfaSym8?from_block=partner&from=zen&mute=0&autoplay=0&tv=0 - Playlist: nothing ## Description <!-- Provide any additional information. If work on your issue requires account credentials please provide them or explain how one can obtain them. --> It is a very popular Russian site and needed to be supported
Author
Owner

@dirkf commented on GitHub (Dec 2, 2023):

Please review and complete the checklist (why is it there?).

The non-JS page seen by yt-dl contains a gigantic hydration JSON object as a parameter of a JS function that defines the page. Video links can be extracted using the traversal path 'data', ..., 0, 'videoViewer', 'items', ..., 0, 'rawStreams', 'SingleStream', ..., 'StreamInfo', ..., 'OutputStream'. 'Title' and 'Thumbnail' are available alongside 'StreamInfo'. I was able to watch the clip about the fake IKEA (why not ШведГаус?) using one of these links in mpv.

@dirkf commented on GitHub (Dec 2, 2023): Please review and complete the checklist (**why is it there?**). The non-JS page seen by yt-dl contains a gigantic hydration JSON object as a parameter of a JS function that defines the page. Video links can be extracted using the traversal path `'data', ..., 0, 'videoViewer', 'items', ..., 0, 'rawStreams', 'SingleStream', ..., 'StreamInfo', ..., 'OutputStream'`. `'Title'` and `'Thumbnail'` are available alongside `'StreamInfo'`. I was able to watch the clip about the fake IKEA (why not ШведГаус?) using one of these links in _mpv_.
Author
Owner

@aflamrip commented on GitHub (Dec 3, 2023):

I have no experience in understanding these matters, but I hope that you will support downloading from this site, as it is a strong competitor to ok.ru and vk.com.

@aflamrip commented on GitHub (Dec 3, 2023): I have no experience in understanding these matters, but I hope that you will support downloading from this site, as it is a strong competitor to ok.ru and vk.com.
Author
Owner

@dirkf commented on GitHub (Dec 3, 2023):

I encourage someone who's interested to take this up. Otherwise it's at the end of a long queue.

@dirkf commented on GitHub (Dec 3, 2023): I encourage someone who's interested to take this up. Otherwise it's at the end of a long queue.
Author
Owner

@Revisto commented on GitHub (Dec 9, 2023):

Hi, can I work on this issue as my first contribution to youtube-dl?

@Revisto commented on GitHub (Dec 9, 2023): Hi, can I work on this issue as my first contribution to youtube-dl?
Author
Owner

@dirkf commented on GitHub (Dec 9, 2023):

By all means. Are you happy that you know what to do? See #29310 in addition to the manual and FAQ.

@dirkf commented on GitHub (Dec 9, 2023): By all means. Are you happy that you know what to do? See #29310 in addition to the manual and FAQ.
Author
Owner

@3052 commented on GitHub (Dec 11, 2023):

note this uBlock Origin rule is needed for the web client:

dzen.ru dzeninfra.ru * noop
@3052 commented on GitHub (Dec 11, 2023): note this uBlock Origin rule is needed for the web client: ~~~ dzen.ru dzeninfra.ru * noop ~~~
Author
Owner

@aflamrip commented on GitHub (Dec 15, 2023):

I am using a peertube instance and it uses ytdl to import videos and this is the error that appears
Command failed with exit code 1: /usr/bin/python3 /var/www/peertube/storage/bin/yt-dlp --merge-output-format mp4 -f bestvideo[vcodec^=avc1][height=720]+bestaudio[ext=m4a]/bestvideo[vcodec!*=av01][vcodec!*=vp9.2][height=720]+bestaudio/bestvideo[vcodec^=avc1][height<=720]+bestaudio[ext=m4a]/bestvideo[vcodec!*=av01][vcodec!*=vp9.2]+bestaudio/best[vcodec!*=av01][vcodec!*=vp9.2]/bestvideo[ext=mp4]+bestaudio[ext=m4a]/best -o /var/www/peertube/storage/tmp/06fa3722e25f07e41b6405880136fbef4fe5dae7fe715463a14c23722933cf18-import https://dzen.ru/video/watch/651a03f748b261411c264611 ERROR: 'NoneType' object is not iterable [ZenYandex] Extracting URL: https://dzen.ru/video/watch/651a03f748b261411c264611 [ZenYandex] 651a03f748b261411c264611: Downloading webpage [ZenYandex] 651a03f748b261411c264611: Redirecting

@aflamrip commented on GitHub (Dec 15, 2023): I am using a peertube instance and it uses ytdl to import videos and this is the error that appears `Command failed with exit code 1: /usr/bin/python3 /var/www/peertube/storage/bin/yt-dlp --merge-output-format mp4 -f bestvideo[vcodec^=avc1][height=720]+bestaudio[ext=m4a]/bestvideo[vcodec!*=av01][vcodec!*=vp9.2][height=720]+bestaudio/bestvideo[vcodec^=avc1][height<=720]+bestaudio[ext=m4a]/bestvideo[vcodec!*=av01][vcodec!*=vp9.2]+bestaudio/best[vcodec!*=av01][vcodec!*=vp9.2]/bestvideo[ext=mp4]+bestaudio[ext=m4a]/best -o /var/www/peertube/storage/tmp/06fa3722e25f07e41b6405880136fbef4fe5dae7fe715463a14c23722933cf18-import https://dzen.ru/video/watch/651a03f748b261411c264611 ERROR: 'NoneType' object is not iterable [ZenYandex] Extracting URL: https://dzen.ru/video/watch/651a03f748b261411c264611 [ZenYandex] 651a03f748b261411c264611: Downloading webpage [ZenYandex] 651a03f748b261411c264611: Redirecting`
Author
Owner

@dirkf commented on GitHub (Dec 15, 2023):

How is this relevant to youtube-dl's support dzen.ru? Please review #30839:

Command failed with exit code 1: /usr/bin/python3 /var/www/peertube/storage/bin/yt-dlp ...

"If you were actually using yt-dlp ..."

@dirkf commented on GitHub (Dec 15, 2023): How is this relevant to youtube-dl's support dzen.ru? Please review #30839: >`Command failed with exit code 1: /usr/bin/python3 /var/www/peertube/storage/bin/yt-dlp ...` "If you were actually using yt-dlp ..."
Author
Owner

@abhimessi16 commented on GitHub (Dec 21, 2023):

Hi @dirkf, is @Revisto still working on the issue else i would like to work on it

@abhimessi16 commented on GitHub (Dec 21, 2023): Hi @dirkf, is @Revisto still working on the issue else i would like to work on it
Author
Owner

@dirkf commented on GitHub (Dec 21, 2023):

There's no PR yet. Unless we hear otherwise, carry on.

@dirkf commented on GitHub (Dec 21, 2023): There's no PR yet. Unless we hear otherwise, carry on.
Author
Owner

@Lil-Nugs commented on GitHub (Jan 28, 2024):

Hi, I'm working on this right now but was wondering if there's a certain way we should handle redirects and how you got the page downloaded, @dirkf? I'm running into an issue where it hits these two URL's before the final webpage with the video and JSON object:

So the _download_webpage_handle function is downloading the empty page at the first page it gets redirected to since they aren't sending back 302 for redirect afterwards.

@Lil-Nugs commented on GitHub (Jan 28, 2024): Hi, I'm working on this right now but was wondering if there's a certain way we should handle redirects and how you got the page downloaded, @dirkf? I'm running into an issue where it hits these two URL's before the final webpage with the video and JSON object: * https://sso.passport.yandex.ru/push?uuid=d738a747-82b1-432d-8536-30918ebb94aa&retpath=https%3A%2F%2Fdzen.ru%2Fvideo%2Fwatch%2F6471e50e06863726828b435c - sends back a 200 * https://sso.dzen.ru/install?uuid=604f2166-bd61-4d1a-bfed-715e331e51d6 - sends back a 200 So the `_download_webpage_handle` function is downloading the empty page at the first page it gets redirected to since they aren't sending back 302 for redirect afterwards.
Author
Owner

@Lil-Nugs commented on GitHub (Jan 28, 2024):

Hm looking at some other issues actually, seems like this could be bypassed by grabbing the cookies after visiting those URL's

@Lil-Nugs commented on GitHub (Jan 28, 2024): Hm looking at some other issues actually, seems like this could be bypassed by grabbing the cookies after visiting those URL's
Author
Owner

@dirkf commented on GitHub (Jan 28, 2024):

Exactly.

I found that I could just download the second problem video and find the media link. Looking at the first one, it seems to go to the SSO link that you mention, presumably because it needs a login. So the first thing to try is passing cookies from a logged-in browser session that can play that video.

Getting --username ... --password ... to work would be ideal but experience indicates that today's login procedures are either too hard to implement or too transient or both. But maybe passport.yandex.ru has been handled already -- I haven't checked.

@dirkf commented on GitHub (Jan 28, 2024): Exactly. I found that I could just download the second problem video and find the media link. Looking at the first one, it seems to go to the SSO link that you mention, presumably because it needs a login. So the first thing to try is passing cookies from a logged-in browser session that can play that video. Getting `--username ... --password ...` to work would be ideal but experience indicates that today's login procedures are either too hard to implement or too transient or both. But maybe passport.yandex.ru has been handled already -- I haven't checked.
Author
Owner

@Debojit-0 commented on GitHub (Mar 14, 2025):

@dirkf
Even let me try to contribute on this as this is my first contribution to open source project.

@Debojit-0 commented on GitHub (Mar 14, 2025): @dirkf Even let me try to contribute on this as this is my first contribution to open source project.
Author
Owner

@dirkf commented on GitHub (Mar 14, 2025):

No-one has reported any further progress: have a go. Per https://github.com/ytdl-org/youtube-dl/issues/29724#issuecomment-891937212 and linked issue #29310, make sure to review the base class (extractor/common.py) and some other new extractors to understand the typical style and API usage.

@dirkf commented on GitHub (Mar 14, 2025): No-one has reported any further progress: have a go. Per https://github.com/ytdl-org/youtube-dl/issues/29724#issuecomment-891937212 and linked issue #29310, make sure to review the base class (`extractor/common.py`) and some other new extractors to understand the typical style and API usage.
Author
Owner

@nolanwelch commented on GitHub (Mar 14, 2025):

@Debojit-0 I previously made an attempt at implementing this extractor, please feel free to use my code here if you find it helpful.

@nolanwelch commented on GitHub (Mar 14, 2025): @Debojit-0 I previously made an attempt at implementing this extractor, please feel free to use my code [here](https://github.com/nolanwelch/youtube-dl/blob/dzen/youtube_dl/extractor/dzenru.py) if you find it helpful.
Author
Owner

@Debojit-0 commented on GitHub (Mar 14, 2025):

And were you able to resolve it ?

@Debojit-0 commented on GitHub (Mar 14, 2025): And were you able to resolve it ?
Author
Owner

@nolanwelch commented on GitHub (Mar 14, 2025):

Unfortunately no. But I think it's a good start.

@nolanwelch commented on GitHub (Mar 14, 2025): Unfortunately no. But I think it's a good start.
Author
Owner

@Debojit-0 commented on GitHub (Mar 14, 2025):

Ok fair enough its a good start for me as well as I am still in learning phase 🙂

@Debojit-0 commented on GitHub (Mar 14, 2025): Ok fair enough its a good start for me as well as I am still in learning phase 🙂
Author
Owner

@Debojit-0 commented on GitHub (Mar 19, 2025):

@nolanwelch @dirkf
The issue is its restricting me my authentication.
Here is the summary or wht i have understood this is a pakcga ehtat helps u to download both webpage and videos from different sources and these sources are stored in the extractor folder which has multiple .py files and each file represent a different wesbites so i need to create a dzu file for this szu website which is a russian website.
So i created the class for this website in the extractor folder asnd ran the test file as well but i get the error
No video formats found!

Now this could be for many reasons

Outdated youtube-dl: The site structure might have changed, and youtube-dl needs to be updated.
Authentication Issue: The video might be behind a login wall, requiring authentication.
Region Restrictions: The video might not be available in your region.
Invalid URL: The URL might be incorrect or inaccessible.

so nay help would be great as i am new to this.I am trying to find the solution mean while if someone can help or guide me

Thankyou

@Debojit-0 commented on GitHub (Mar 19, 2025): @nolanwelch @dirkf The issue is its restricting me my authentication. Here is the summary or wht i have understood this is a pakcga ehtat helps u to download both webpage and videos from different sources and these sources are stored in the extractor folder which has multiple .py files and each file represent a different wesbites so i need to create a dzu file for this szu website which is a russian website. So i created the class for this website in the extractor folder asnd ran the test file as well but i get the error No video formats found! Now this could be for many reasons Outdated youtube-dl: The site structure might have changed, and youtube-dl needs to be updated. Authentication Issue: The video might be behind a login wall, requiring authentication. Region Restrictions: The video might not be available in your region. Invalid URL: The URL might be incorrect or inaccessible. so nay help would be great as i am new to this.I am trying to find the solution mean while if someone can help or guide me Thankyou
Author
Owner

@dirkf commented on GitHub (Mar 19, 2025):

The debugger (import pdb; pdb.set_trace()) is your friend.

  • Be sure to review the links (and their linked material) that I posted above. Even the worked example from 2014 or so is still relevant.
  • Did you add your class to extractors/extractors.py?
  • Does your class inherit from InfoExtractor?
  • Does your class have a _VALID_URL that matches the URL of a test page?
  • Does it have a _real_extract() method?
  • Does the method extract an ID from its url argument?
  • Does it fetch the webpage from url and extract the media data from it?
  • If not, does it use some API URL to do that?
  • Does it populate a dictionary with at least id, title and formats values according to the spec in extractors/common.py?
  • Does it call _sort_formats() on the formats?
  • Does it return the dictionary if successful, or else raise ExtractorError?
@dirkf commented on GitHub (Mar 19, 2025): The debugger (`import pdb; pdb.set_trace()`) is your friend. * Be sure to review the links (and their linked material) that I posted above. Even the worked example from 2014 or so is still relevant. * Did you add your class to `extractors/extractors.py`? * Does your class inherit from `InfoExtractor`? * Does your class have a `_VALID_URL` that matches the URL of a test page? * Does it have a `_real_extract()` method? * Does the method extract an ID from its `url` argument? * Does it fetch the webpage from `url` and extract the media data from it? * If not, does it use some API URL to do that? * Does it populate a dictionary with at least `id`, `title` and `formats` values according to the spec in `extractors/common.py`? * Does it call `_sort_formats()` on the `formats`? * Does it return the dictionary if successful, or else raise `ExtractorError`?
Author
Owner

@Debojit-0 commented on GitHub (Mar 23, 2025):

  • Did you add your class to extractors/extractors.py? --> yes
  • Does your class inherit from InfoExtractor? --> yes
  • Does your class inherit from InfoExtractor? --> yes
  • Does your class have a _VALID_URL that matches the URL of a test page?
    --> i guess yes
  • Does it have a _real_extract() method? --yes
  • Does the method extract an ID from its url argument? --> yes not
    method but i can see its extracting the url but redirecting me to
    login page\Does it populate a dictionary with at least id, title and
    formats values according to the spec in extractors/common.py? --> no
  • Does it call _sort_formats() on the formats? --> how do i check this
  • Does it return the dictionary if successful, or else raise
    ExtractorError? --> raise ExtractorError as format not found as its
    directing me to the login authentication page

here is the error

EReturns the info extractor class with the given ie_name <class
'youtube_dl.extractor.dzen.DzenIE'>
Returns the info extractor class with the given ie_name <class
'youtube_dl.extractor.dzen.DzenIE'>
[Dzen] 6471e50e06863726828b435c: Downloading webpage
webpage1
scripts var it =
{"host":"https:\u002F\u002Fsso.dzen.ru\u002Finstall?uuid=c81f47cc-9856-4de2-9ffa-ed0beda56016","retpath":"https:\u002F\u002Fdzen.ru\u002Fvideo\u002Fwatch\u002F6471e50e06863726828b435c?is_autologin_ya=true"};(function()
{ var form = document.createElement('form'); var element1 =
document.createElement('input'); var element2 =
document.createElement('input'); element1.name = 'retpath';
element1.type = 'hidden'; element1.value = it.retpath;
form.appendChild(element1); element2.name = 'container'; element2.type =
'hidden'; element2.value =
'1742744099.10153489.3OXfvFuN4Jw6zs2g.PVdwL66GVBOAVp7NlR5qKPyORf2lQwsFdqQDVUWHd3gQ1XXBC9NCvnxmL9v6LJzQ23DMBlH_jTS1PqUGNjlFmBxBOvO05rCsgF06VtUP5cfEUFQS3QVlQsoO2j6bPmtz9izK6MSsyPl6w9IppCNNK5BL97wd_zgdBUuxIs1kHIqrisvlGQqIebPxgn8_3KuY_M1cie7iIo_3fllLY8oAsGClA-uxVDy0Kb_OTOzE8NwPNe-ebSZ93rJCDkWdVC8DX6poDD8vYRj9GM3zEtkL30qm8fgZeAPiYhb6yuazED5iJ-rESeCIo4GBzVgusep004zmc_91sGp4Mt-LBnc4_O5phLHvUv58GFFjRyfHvq4s30TB4T_nHrvAkwR_ZMy2fJe2bVSjxJCAfZBMK5wbu6xiKaVPoaLjx_NQSjVERJrrbWNJqwY8JDB1Quc1u23QNuG3VHFRm1aYwTWc5L3Nr99UNn89laRMWrC6S1vX2MIl-17Vhb9aCCpedWC6mkNPtS9O5vzDZmCZmDw-p7BOR0DrqUnlnpANCXnxLnyRRigvoF6TvuuR0IkcnazTJFhrwLh4XXWTVPcyhepUO_QF7YZ0fEEncrGMo8FscZmikfYbVIe8fuAA8MCp8tBRTJ0bLXOjkPZ2nrIN3AXyB43gZ_qNFLkrm8mQPrQs14EsK-l6bkWCC6Dwx5vhKsC4lDn4ABhJI8TxOCP_XCo_hMphDhJQ__HZfvtvtgUKtDl5ONNMFBnoEHsmkpQP_pOMuVdaZ0BzvZMgysxqmhDnksvRvh03--Nh0MU01SV6wpxiVUSVz_bJmL2XcDtljpwFQDldAUAM6IxYoiGRY73yfj_pyGXIWjwyixnGnvbxJWar2jxS9YVJGuaTG-AKbhj6BuNXMe8v0FdgsVNVsqlhtXwXd5PJ-xIKlIkqRItU6bR6UOsEA-LmiEhSI2vga_4WCOisk4sn43R8aFyWTbedrBmOEKZJ9rleIpijnosaxJRdrkRyqydjlg8V1mQ-c7Fqrf6CVYjffNkSzEM99qa5Zx-oSkHMJoiAYYQQYhBxx2IVhVjfg39yZzp10nvPNRi7fG2dEi7mJ7Hva3JNJZfUv8lbxo2uCg-nQTvSW15DD8MTKgI2xZHSnvQbZ2_emA1tOnflUziLeDnZe24vBDcQVykQm-L22v3M5Bm6K0qehHeaMNog-QXR.Ufmw94PycWbuZL0Ujfu74A';
form.appendChild(element2); form.method = 'POST'; form.action = it.host;
document.body.appendChild(form); form.submit();})();
ERROR: No video formats found!; please report this issue on
https://github.com/ytdl-org/youtube-dl/issues , using the appropriate
issue template. Make sure you are using the latest version; see
https://github.com/ytdl-org/youtube-dl/#user-content-installation on
how to update. Be sure to call youtube-dl with the --verbose option and
include the complete output.
Traceback (most recent call last):
File "C:\Users\DEBOJ\OneDrive\Desktop\opensource
projects\ytdl\youtube-dl\youtube_dl\YoutubeDL.py", line 875, in wrapper
return func(self, *args, **kwargs)
File "C:\Users\DEBOJ\OneDrive\Desktop\opensource
projects\ytdl\youtube-dl\youtube_dl\YoutubeDL.py", line 982, in
__extract_info
return self.process_ie_result(ie_result, download, extra_info)
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\DEBOJ\OneDrive\Desktop\opensource
projects\ytdl\youtube-dl\youtube_dl\YoutubeDL.py", line 1016, in
process_ie_result
return self.process_video_result(ie_result, download=download)
~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\DEBOJ\OneDrive\Desktop\opensource
projects\ytdl\youtube-dl\youtube_dl\YoutubeDL.py", line 1752, in
process_video_result
raise ExtractorError('No video formats found!')
youtube_dl.utils.ExtractorError: No video formats found!; please report
this issue on https://github.com/ytdl-org/youtube-dl/issues , using the
appropriate issue template. Make sure you are using the latest version; see
https://github.com/ytdl-org/youtube-dl/#user-content-installation on
how to update. Be sure to call youtube-dl with the --verbose
option and include the complete output.

On Wed, Mar 19, 2025 at 3:45 PM dirkf @.***> wrote:

The debugger (import pdb; pdb.set_trace()) is your friend.

  • Be sure to review the links (and their linked material) that I
    posted above. Even the worked example from 2014 or so is still relevant.
  • Did you add your class to extractors/extractors.py?
  • Does your class inherit from InfoExtractor?
  • Does your class have a _VALID_URL that matches the URL of a test
    page?
  • Does it have a _real_extract() method?
  • Does the method extract an ID from its url argument?
  • Does it fetch the webpage from url and extract the media data from
    it?
  • If not, does it use some API URL to do that?
  • Does it populate a dictionary with at least id, title and formats
    values according to the spec in extractors/common.py?
  • Does it call _sort_formats() on the formats?
  • Does it return the dictionary if successful, or else raise
    ExtractorError?


Reply to this email directly, view it on GitHub
https://github.com/ytdl-org/youtube-dl/issues/32654#issuecomment-2736043762,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ASNBXIOYO3VCATTT7NMBRC32VE7V3AVCNFSM6AAAAABZAEJPQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMZWGA2DGNZWGI
.
You are receiving this because you were mentioned.Message ID:
@.***>
[image: dirkf]dirkf left a comment (ytdl-org/youtube-dl#32654)
https://github.com/ytdl-org/youtube-dl/issues/32654#issuecomment-2736043762

The debugger (import pdb; pdb.set_trace()) is your friend.

  • Be sure to review the links (and their linked material) that I
    posted above. Even the worked example from 2014 or so is still relevant.
  • Did you add your class to extractors/extractors.py?
  • Does your class inherit from InfoExtractor?
  • Does your class have a _VALID_URL that matches the URL of a test
    page?
  • Does it have a _real_extract() method?
  • Does the method extract an ID from its url argument?
  • Does it fetch the webpage from url and extract the media data from
    it?
  • If not, does it use some API URL to do that?
  • Does it populate a dictionary with at least id, title and formats
    values according to the spec in extractors/common.py?
  • Does it call _sort_formats() on the formats?
  • Does it return the dictionary if successful, or else raise
    ExtractorError?


Reply to this email directly, view it on GitHub
https://github.com/ytdl-org/youtube-dl/issues/32654#issuecomment-2736043762,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ASNBXIOYO3VCATTT7NMBRC32VE7V3AVCNFSM6AAAAABZAEJPQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMZWGA2DGNZWGI
.
You are receiving this because you were mentioned.Message ID:
@.***>

@Debojit-0 commented on GitHub (Mar 23, 2025): - Did you add your class to extractors/extractors.py? --> yes - Does your class inherit from InfoExtractor? --> yes - Does your class inherit from InfoExtractor? --> yes - Does your class have a _VALID_URL that matches the URL of a test page? --> i guess yes - Does it have a _real_extract() method? --yes - Does the method extract an ID from its url argument? --> yes not method but i can see its extracting the url but redirecting me to login page\Does it populate a dictionary with at least id, title and formats values according to the spec in extractors/common.py? --> no - Does it call _sort_formats() on the formats? --> how do i check this - Does it return the dictionary if successful, or else raise ExtractorError? --> raise ExtractorError as format not found as its directing me to the login authentication page here is the error EReturns the info extractor class with the given ie_name <class 'youtube_dl.extractor.dzen.DzenIE'> Returns the info extractor class with the given ie_name <class 'youtube_dl.extractor.dzen.DzenIE'> [Dzen] 6471e50e06863726828b435c: Downloading webpage webpage1 <body></body><script nonce='7065aec34050c5d5fd01729c19236188'>var it = {"host":"https:\u002F\ u002Fsso.dzen.ru\u002Finstall?uuid=c81f47cc-9856-4de2-9ffa-ed0beda56016","retpath":"https:\u002F\u002Fdzen.ru\u002Fvideo\u002Fwatch\u002F6471e50e06863726828b435c?is_autologin_ya=true"};(function() { var form = document.createElement('form'); var element1 = document.createElement('input'); var element2 = document.createElement('input'); element1.name = 'retpath'; element1.type = 'hidden'; element1.value = it.retpath; form.appendChild(element1); element2.name = 'container'; element2.type = 'hidden'; element2.value = '1742744099.10153489.3OXfvFuN4Jw6zs2g.PVdwL66GVBOAVp7NlR5qKPyORf2lQwsFdqQDVUWHd3gQ1XXBC9NCvnxmL9v6LJzQ23DMBlH_jTS1PqUGNjlFmBxBOvO05rCsgF06VtUP5cfEUFQS3QVlQsoO2j6bPmtz9izK6MSsyPl6w9IppCNNK5BL97wd_zgdBUuxIs1kHIqrisvlGQqIebPxgn8_3KuY_M1cie7iIo_3fllLY8oAsGClA-uxVDy0Kb_OTOzE8NwPNe-ebSZ93rJCDkWdVC8DX6poDD8vYRj9GM3zEtkL30qm8fgZeAPiYhb6yuazED5iJ-rESeCIo4GBzVgusep004zmc_91sGp4Mt-LBnc4_O5phLHvUv58GFFjRyfHvq4s30TB4T_nHrvAkwR_ZMy2fJe2bVSjxJCAfZBMK5wbu6xiKaVPoaLjx_NQSjVERJrrbWNJqwY8JDB1Quc1u23QNuG3VHFRm1aYwTWc5L3Nr99UNn89laRMWrC6S1vX2MIl-17Vhb9aCCpedWC6mkNPtS9O5vzDZmCZmDw-p7BOR0DrqUnlnpANCXnxLnyRRigvoF6TvuuR0IkcnazTJFhrwLh4XXWTVPcyhepUO_QF7YZ0fEEncrGMo8FscZmikfYbVIe8fuAA8MCp8tBRTJ0bLXOjkPZ2nrIN3AXyB43gZ_qNFLkrm8mQPrQs14EsK-l6bkWCC6Dwx5vhKsC4lDn4ABhJI8TxOCP_XCo_hMphDhJQ__HZfvtvtgUKtDl5ONNMFBnoEHsmkpQP_pOMuVdaZ0BzvZMgysxqmhDnksvRvh03--Nh0MU01SV6wpxiVUSVz_bJmL2XcDtljpwFQDldAUAM6IxYoiGRY73yfj_pyGXIWjwyixnGnvbxJWar2jxS9YVJGuaTG-AKbhj6BuNXMe8v0FdgsVNVsqlhtXwXd5PJ-xIKlIkqRItU6bR6UOsEA-LmiEhSI2vga_4WCOisk4sn43R8aFyWTbedrBmOEKZJ9rleIpijnosaxJRdrkRyqydjlg8V1mQ-c7Fqrf6CVYjffNkSzEM99qa5Zx-oSkHMJoiAYYQQYhBxx2IVhVjfg39yZzp10nvPNRi7fG2dEi7mJ7Hva3JNJZfUv8lbxo2uCg-nQTvSW15DD8MTKgI2xZHSnvQbZ2_emA1tOnflUziLeDnZe24vBDcQVykQm-L22v3M5Bm6K0qehHeaMNog-QXR.Ufmw94PycWbuZL0Ujfu74A'; form.appendChild(element2); form.method = 'POST'; form.action = it.host; document.body.appendChild(form); form.submit();})();</script> scripts var it = {"host":"https:\u002F\u002Fsso.dzen.ru\u002Finstall?uuid=c81f47cc-9856-4de2-9ffa-ed0beda56016","retpath":"https:\u002F\u002Fdzen.ru\u002Fvideo\u002Fwatch\u002F6471e50e06863726828b435c?is_autologin_ya=true"};(function() { var form = document.createElement('form'); var element1 = document.createElement('input'); var element2 = document.createElement('input'); element1.name = 'retpath'; element1.type = 'hidden'; element1.value = it.retpath; form.appendChild(element1); element2.name = 'container'; element2.type = 'hidden'; element2.value = '1742744099.10153489.3OXfvFuN4Jw6zs2g.PVdwL66GVBOAVp7NlR5qKPyORf2lQwsFdqQDVUWHd3gQ1XXBC9NCvnxmL9v6LJzQ23DMBlH_jTS1PqUGNjlFmBxBOvO05rCsgF06VtUP5cfEUFQS3QVlQsoO2j6bPmtz9izK6MSsyPl6w9IppCNNK5BL97wd_zgdBUuxIs1kHIqrisvlGQqIebPxgn8_3KuY_M1cie7iIo_3fllLY8oAsGClA-uxVDy0Kb_OTOzE8NwPNe-ebSZ93rJCDkWdVC8DX6poDD8vYRj9GM3zEtkL30qm8fgZeAPiYhb6yuazED5iJ-rESeCIo4GBzVgusep004zmc_91sGp4Mt-LBnc4_O5phLHvUv58GFFjRyfHvq4s30TB4T_nHrvAkwR_ZMy2fJe2bVSjxJCAfZBMK5wbu6xiKaVPoaLjx_NQSjVERJrrbWNJqwY8JDB1Quc1u23QNuG3VHFRm1aYwTWc5L3Nr99UNn89laRMWrC6S1vX2MIl-17Vhb9aCCpedWC6mkNPtS9O5vzDZmCZmDw-p7BOR0DrqUnlnpANCXnxLnyRRigvoF6TvuuR0IkcnazTJFhrwLh4XXWTVPcyhepUO_QF7YZ0fEEncrGMo8FscZmikfYbVIe8fuAA8MCp8tBRTJ0bLXOjkPZ2nrIN3AXyB43gZ_qNFLkrm8mQPrQs14EsK-l6bkWCC6Dwx5vhKsC4lDn4ABhJI8TxOCP_XCo_hMphDhJQ__HZfvtvtgUKtDl5ONNMFBnoEHsmkpQP_pOMuVdaZ0BzvZMgysxqmhDnksvRvh03--Nh0MU01SV6wpxiVUSVz_bJmL2XcDtljpwFQDldAUAM6IxYoiGRY73yfj_pyGXIWjwyixnGnvbxJWar2jxS9YVJGuaTG-AKbhj6BuNXMe8v0FdgsVNVsqlhtXwXd5PJ-xIKlIkqRItU6bR6UOsEA-LmiEhSI2vga_4WCOisk4sn43R8aFyWTbedrBmOEKZJ9rleIpijnosaxJRdrkRyqydjlg8V1mQ-c7Fqrf6CVYjffNkSzEM99qa5Zx-oSkHMJoiAYYQQYhBxx2IVhVjfg39yZzp10nvPNRi7fG2dEi7mJ7Hva3JNJZfUv8lbxo2uCg-nQTvSW15DD8MTKgI2xZHSnvQbZ2_emA1tOnflUziLeDnZe24vBDcQVykQm-L22v3M5Bm6K0qehHeaMNog-QXR.Ufmw94PycWbuZL0Ujfu74A'; form.appendChild(element2); form.method = 'POST'; form.action = it.host; document.body.appendChild(form); form.submit();})(); ERROR: No video formats found!; please report this issue on https://github.com/ytdl-org/youtube-dl/issues , using the appropriate issue template. Make sure you are using the latest version; see https://github.com/ytdl-org/youtube-dl/#user-content-installation on how to update. Be sure to call youtube-dl with the --verbose option and include the complete output. Traceback (most recent call last): File "C:\Users\DEBOJ\OneDrive\Desktop\opensource projects\ytdl\youtube-dl\youtube_dl\YoutubeDL.py", line 875, in wrapper return func(self, *args, **kwargs) File "C:\Users\DEBOJ\OneDrive\Desktop\opensource projects\ytdl\youtube-dl\youtube_dl\YoutubeDL.py", line 982, in __extract_info return self.process_ie_result(ie_result, download, extra_info) ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\DEBOJ\OneDrive\Desktop\opensource projects\ytdl\youtube-dl\youtube_dl\YoutubeDL.py", line 1016, in process_ie_result return self.process_video_result(ie_result, download=download) ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\DEBOJ\OneDrive\Desktop\opensource projects\ytdl\youtube-dl\youtube_dl\YoutubeDL.py", line 1752, in process_video_result raise ExtractorError('No video formats found!') youtube_dl.utils.ExtractorError: No video formats found!; please report this issue on https://github.com/ytdl-org/youtube-dl/issues , using the appropriate issue template. Make sure you are using the latest version; see https://github.com/ytdl-org/youtube-dl/#user-content-installation on how to update. Be sure to call youtube-dl with the --verbose option and include the complete output. On Wed, Mar 19, 2025 at 3:45 PM dirkf ***@***.***> wrote: > The debugger (import pdb; pdb.set_trace()) is your friend. > > - Be sure to review the links (and their linked material) that I > posted above. Even the worked example from 2014 or so is still relevant. > - Did you add your class to extractors/extractors.py? > - Does your class inherit from InfoExtractor? > - Does your class have a _VALID_URL that matches the URL of a test > page? > - Does it have a _real_extract() method? > - Does the method extract an ID from its url argument? > - Does it fetch the webpage from url and extract the media data from > it? > - If not, does it use some API URL to do that? > - Does it populate a dictionary with at least id, title and formats > values according to the spec in extractors/common.py? > - Does it call _sort_formats() on the formats? > - Does it return the dictionary if successful, or else raise > ExtractorError? > > — > Reply to this email directly, view it on GitHub > <https://github.com/ytdl-org/youtube-dl/issues/32654#issuecomment-2736043762>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ASNBXIOYO3VCATTT7NMBRC32VE7V3AVCNFSM6AAAAABZAEJPQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMZWGA2DGNZWGI> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> > [image: dirkf]*dirkf* left a comment (ytdl-org/youtube-dl#32654) > <https://github.com/ytdl-org/youtube-dl/issues/32654#issuecomment-2736043762> > > The debugger (import pdb; pdb.set_trace()) is your friend. > > - Be sure to review the links (and their linked material) that I > posted above. Even the worked example from 2014 or so is still relevant. > - Did you add your class to extractors/extractors.py? > - Does your class inherit from InfoExtractor? > - Does your class have a _VALID_URL that matches the URL of a test > page? > - Does it have a _real_extract() method? > - Does the method extract an ID from its url argument? > - Does it fetch the webpage from url and extract the media data from > it? > - If not, does it use some API URL to do that? > - Does it populate a dictionary with at least id, title and formats > values according to the spec in extractors/common.py? > - Does it call _sort_formats() on the formats? > - Does it return the dictionary if successful, or else raise > ExtractorError? > > — > Reply to this email directly, view it on GitHub > <https://github.com/ytdl-org/youtube-dl/issues/32654#issuecomment-2736043762>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ASNBXIOYO3VCATTT7NMBRC32VE7V3AVCNFSM6AAAAABZAEJPQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMZWGA2DGNZWGI> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >
Author
Owner

@dirkf commented on GitHub (Mar 23, 2025):

Aha! When I first looked at this, redirection to the SSO login wasn't a problem.

What you need to do is pull the ZenYandex stuff from yt-dlp, because dzen.ru and zen.yandex.ru appear to work in the same way. The SSO login can normally be bypassed by following the redirection in the interstitial JS. In this WIP version that I found, _ZY_download() does this:

ZenYandex back-port with initial dzen.ru support
class ZenYandexBase(InfoExtractor):
    def _ZY_download_webpage(self, url, item_id, *args, **kwargs):
        #                               (url_or_request, video_id, note=None, ...
        webpage = self._download_webpage(url, item_id, *args, **kwargs)
        redirect = self._search_json(r'var\s+it\s*=', webpage, 'redirect', item_id, default={}).get('retpath')
        if redirect:
            item_id = self._match_id(redirect)
            note = 'Redirecting'
            if len(args) > 0:
                args = (note,) + args[1:]
            else:
                kwargs['note'] = note
                kwargs = compat_kwargs(kwargs)
            webpage = self._download_webpage(redirect, item_id, *args, **kwargs)
        return webpage

    def _search_ssrData(self, webpage, name, item_id):
        return self._search_json(
            # initial pattern can appear many times, so disambiguate
            r'var\s+_params\s*=\s*\((?=\s*\{\s*"ssrData")', webpage,
            name, item_id)


class ZenYandexIE(ZenYandexBase):
    _VALID_URL = r'https?://(zen\.yandex|dzen)\.ru(?:/video)?/(media|watch)/(?:(?:id/[^/]+/|[^/]+/)(?:[a-z0-9-]+)-)?(?P<id>[a-z0-9-]+)'
    _TESTS = [{
        'url': 'https://zen.yandex.ru/media/id/606fd806cc13cb3c58c05cf5/vot-eto-focus-dedy-morozy-na-gidrociklah-60c7c443da18892ebfe85ed7',
        'info_dict': {
            'id': '60c7c443da18892ebfe85ed7',
            'ext': 'mp4',
            'title': 'ВОТ ЭТО Focus. Деды Морозы на гидроциклах',
            'description': 'md5:f3db3d995763b9bbb7b56d4ccdedea89',
            'thumbnail': 're:^https://avatars.dzeninfra.ru/',
            'uploader': 'AcademeG DailyStream',
        },
        'params': {
            'skip_download': 'm3u8',
            'format': 'bestvideo',
        },
        'skip': 'The page does not exist',
    }, {
        'url': 'https://dzen.ru/media/id/606fd806cc13cb3c58c05cf5/vot-eto-focus-dedy-morozy-na-gidrociklah-60c7c443da18892ebfe85ed7',
        'info_dict': {
            'id': '60c7c443da18892ebfe85ed7',
            'ext': 'mp4',
            'title': 'ВОТ ЭТО Focus. Деды Морозы на гидроциклах',
            'description': 'md5:8684912f6086f298f8078d4af0e8a600',
            'thumbnail': r're:^https://avatars\.dzeninfra\.ru/',
            'uploader': 'AcademeG DailyStream',
            'upload_date': '20191111',
            'timestamp': 1573465585,
        },
        'params': {'skip_download': 'm3u8'},
    }, {
        'url': 'https://zen.yandex.ru/video/watch/6002240ff8b1af50bb2da5e3',
        'info_dict': {
            'id': '6002240ff8b1af50bb2da5e3',
            'ext': 'mp4',
            'title': 'Извержение вулкана из спичек: зрелищный опыт',
            'description': 'md5:053ad3c61b5596d510c9a199dc8ee633',
            'thumbnail': r're:^https://avatars\.dzeninfra\.ru/',
            'uploader': 'TechInsider',
            'timestamp': 1611378221,
            'upload_date': '20210123',
        },
        'params': {'skip_download': 'm3u8'},
    }, {
        'url': 'https://dzen.ru/video/watch/6002240ff8b1af50bb2da5e3',
        'info_dict': {
            'id': '6002240ff8b1af50bb2da5e3',
            'ext': 'mp4',
            'title': 'Извержение вулкана из спичек: зрелищный опыт',
            'description': 'md5:053ad3c61b5596d510c9a199dc8ee633',
            'thumbnail': 're:^https://avatars.dzeninfra.ru/',
            'uploader': 'TechInsider',
            'upload_date': '20210123',
            'timestamp': 1611378221,
        },
        'params': {'skip_download': 'm3u8'},
    }, {
        'url': 'https://zen.yandex.ru/media/id/606fd806cc13cb3c58c05cf5/novyi-samsung-fold-3-moskvich-barahlit-612f93b7f8d48e7e945792a2?from=channel&rid=2286618386.482.1630817595976.42360',
        'only_matching': True,
    }, {
        'url': 'https://dzen.ru/media/id/606fd806cc13cb3c58c05cf5/novyi-samsung-fold-3-moskvich-barahlit-612f93b7f8d48e7e945792a2?from=channel&rid=2286618386.482.1630817595976.42360',
        'only_matching': True,
    }]

    def _real_extract(self, url):
        video_id = self._match_id(url)
        webpage = self._ZY_download_webpage(url, video_id)
        data_json = self._search_json(
            r'data\s*=', webpage, 'old video data', video_id, contains_pattern=r'{["\']_*serverState_*video.+}', default=None)
        if data_json:
            server_state = self._search_regex(r'(_+serverState_+video-site_[^_]+_+)',
                                              webpage, 'server state').replace('State', 'Settings')
            video_json = traverse_obj(data_json, (server_state, 'exportData', 'video', T(dict)))
            stream_urls = traverse_obj(video_json, ('video', 'streams', Ellipsis))
        else:
            data_json = self._search_ssrData(webpage, 'video data', video_id)
            video_json = traverse_obj(data_json, ('ssrData', 'videoMetaResponse'))
            stream_urls = traverse_obj(data_json, (
                'ssrData', 'streamsResponse', 'SingleStream', Ellipsis,
                'StreamInfo', Ellipsis, 'OutputStream'))

        def search_uploader(html):
            uploader = self._search_regex(r'(<a\s(?:[^>]*\s)?class\s*=\s*(?P<q>"|\')(?:[\w$.-]+\s)*card-channel-link(?:\s[\w$.-]+)*(?P=q)[^>]*>)',
                                          html, 'uploader', default='')
            return extract_attributes(uploader).get('aria-label')

        formats = []
        subtitles = {}
        for s_url in stream_urls:
            ext = determine_ext(s_url)
            if ext == 'mpd':
                fmts, subs = self._extract_mpd_formats_and_subtitles(s_url, video_id, mpd_id='dash', fatal=False)
                formats.extend(fmts)
                self._merge_subtitles(subs, target=subtitles)
            elif ext == 'm3u8':
                formats.extend(self._extract_m3u8_formats(s_url, video_id, 'mp4', fatal=False))
        self._sort_formats(formats)

        return merge_dicts({
            'id': video_id,
            'formats': formats,
            'subtitles': subtitles,
        }, traverse_obj(video_json, {
            'title': (('title', T(lambda _: self._og_search_title(webpage))),),
            'description': (('description',
                            T(lambda _: self._og_search_description(webpage, default=None))),),
            'duration': ((None, 'video'), 'duration', T(int_or_none)),
            'view_count': ((None, 'video'), 'views', T(int_or_none)),
            'timestamp': ('publicationDate', T(int_or_none)),
            'uploader': (('source', 'title'), T(lambda _: search_uploader(webpage)),),
            'thumbnail': (('image',
                           T(lambda _: self._og_search_thumbnail(webpage, default=None))),
                          T(url_or_none)),
            'age_limit': ('video', 'adult', T(lambda x: 18 if x else None)),
        }, get_all=False), traverse_obj(data_json, {
            'uploader': (('authorName', ('publisher', 'name')),),
            'description': ('og', 'description'),
            'thumbnail': ('og', 'imageUrl', T(url_or_none)),
        }, get_all=False), traverse_obj(video_json, {
            'tags': ('tags', Ellipsis),
        }))


class ZenYandexChannelIE(ZenYandexBase):
    _VALID_URL = r'https?://(zen\.yandex|dzen)\.ru/(?!media|video)(?:id/)?(?P<id>[a-z0-9-_]+)'
    _TESTS = [{
        'url': 'https://zen.yandex.ru/tok_media',
        'info_dict': {
            'id': 'tok_media',
            'title': 'СПЕКТР',
            'description': 'md5:a9e5b3c247b7fe29fd21371a428bcf56',
        },
        'playlist_mincount': 169,
        'skip': 'The page does not exist',
    }, {
        'url': 'https://dzen.ru/tok_media',
        'info_dict': {
            'id': 'tok_media',
            'title': 'СПЕКТР',
            'description': 'md5:a9e5b3c247b7fe29fd21371a428bcf56',
        },
        'playlist_mincount': 169,
        'skip': 'The page does not exist',
    }, {
        'url': 'https://zen.yandex.ru/id/606fd806cc13cb3c58c05cf5',
        'info_dict': {
            'id': '606fd806cc13cb3c58c05cf5',
            'description': 'md5:517b7c97d8ca92e940f5af65448fd928',
            'title': 'AcademeG DailyStream',
        },
        'playlist_mincount': 657,
    }, {
        # Test that the playlist extractor finishes extracting when the
        # channel has less than one page
        'url': 'https://zen.yandex.ru/jony_me',
        'info_dict': {
            'id': 'jony_me',
            'description': 'md5:a2c62b4ef5cf3e3efb13d25f61f739e1',
            'title': 'JONY ',
        },
        'playlist_count': 18,
    }, {
        # Test that the playlist extractor finishes extracting when the
        # channel has more than one page of entries
        'url': 'https://zen.yandex.ru/tatyanareva',
        'info_dict': {
            'id': 'tatyanareva',
            'description': 'md5:40a1e51f174369ec3ba9d657734ac31f',
            'title': 'Татьяна Рева',
            'entries': 'maxcount:200',
        },
        'playlist_mincount': 46,
    }, {
        'url': 'https://dzen.ru/id/606fd806cc13cb3c58c05cf5',
        'info_dict': {
            'id': '606fd806cc13cb3c58c05cf5',
            'title': 'AcademeG DailyStream',
            'description': 'md5:517b7c97d8ca92e940f5af65448fd928',
        },
        'playlist_mincount': 657,
    }]

    def _entries(self, item_id, server_state_json, server_settings_json):
        server_json = (server_state_json, server_settings_json)
        data = traverse_obj(server_json, (0, 'feed'), (1, 'exportData'),
                            expected_type=lambda x: x['items'][0])

        more = traverse_obj(server_json, (0, 'links', 'more'), (1, 'exportData', 'more', 'link'))

        next_page_id = None
        for page in itertools.count(1):
            for item in traverse_obj(data, ('items', lambda _, v: v.get('type') == 'gif')):
                video_id = traverse_obj(
                    item, 'id',
                    (('publication_id', 'publicationId'), T(lambda s: s.rsplit(':', 1)[-1])),
                    get_all=False) or ''
                yield self.url_result(item['link'], ZenYandexIE.ie_key(), video_id)

            current_page_id = next_page_id
            next_page_id = traverse_obj(parse_qs(more), ('next_page_id', -1))
            if not all((more, next_page_id, next_page_id != current_page_id)):
                break

            data = self._download_json(more, item_id, note='Downloading page {0}'.format(page))
            more = traverse_obj(data, ('more', 'link'))

    def _real_extract(self, url):
        item_id = self._match_id(url)
        webpage = self._ZY_download_webpage(url, item_id)
        data = self._search_json(
            r'var\s+data\s*=', webpage, 'old channel data', item_id, contains_pattern=r'{\"__serverState__.+}', default=None)
        if data:
            server_state_json = traverse_obj(data, lambda k, _: k.startswith('__serverState__'), get_all=False)
            server_settings_json = traverse_obj(data, lambda k, _: k.startswith('__serverSettings__'), get_all=False)
        else:
            data = self._search_ssrData(webpage, 'channel data', item_id)
            server_state_json = {}
            server_settings_json = traverse_obj(data, ('ssrData', 'exportResponse', {'exportData': 'feedData'}))

        return self.playlist_result(
            self._entries(item_id, server_state_json, server_settings_json),
            item_id, traverse_obj(server_state_json, ('channel', 'source', 'title')),
            traverse_obj(server_state_json, ('channel', 'source', 'description')))

This should be appended to the code in extractor/yandexvideo.py

@dirkf commented on GitHub (Mar 23, 2025): Aha! When I first looked at this, redirection to the SSO login wasn't a problem. What you need to do is pull the ZenYandex stuff from _yt-dlp_, because dzen.ru and zen.yandex.ru appear to work in the same way. The SSO login can normally be bypassed by following the redirection in the interstitial JS. In this WIP version that I found, `_ZY_download()` does this: <details><summary>ZenYandex back-port with initial dzen.ru support</summary> ```py class ZenYandexBase(InfoExtractor): def _ZY_download_webpage(self, url, item_id, *args, **kwargs): # (url_or_request, video_id, note=None, ... webpage = self._download_webpage(url, item_id, *args, **kwargs) redirect = self._search_json(r'var\s+it\s*=', webpage, 'redirect', item_id, default={}).get('retpath') if redirect: item_id = self._match_id(redirect) note = 'Redirecting' if len(args) > 0: args = (note,) + args[1:] else: kwargs['note'] = note kwargs = compat_kwargs(kwargs) webpage = self._download_webpage(redirect, item_id, *args, **kwargs) return webpage def _search_ssrData(self, webpage, name, item_id): return self._search_json( # initial pattern can appear many times, so disambiguate r'var\s+_params\s*=\s*\((?=\s*\{\s*"ssrData")', webpage, name, item_id) class ZenYandexIE(ZenYandexBase): _VALID_URL = r'https?://(zen\.yandex|dzen)\.ru(?:/video)?/(media|watch)/(?:(?:id/[^/]+/|[^/]+/)(?:[a-z0-9-]+)-)?(?P<id>[a-z0-9-]+)' _TESTS = [{ 'url': 'https://zen.yandex.ru/media/id/606fd806cc13cb3c58c05cf5/vot-eto-focus-dedy-morozy-na-gidrociklah-60c7c443da18892ebfe85ed7', 'info_dict': { 'id': '60c7c443da18892ebfe85ed7', 'ext': 'mp4', 'title': 'ВОТ ЭТО Focus. Деды Морозы на гидроциклах', 'description': 'md5:f3db3d995763b9bbb7b56d4ccdedea89', 'thumbnail': 're:^https://avatars.dzeninfra.ru/', 'uploader': 'AcademeG DailyStream', }, 'params': { 'skip_download': 'm3u8', 'format': 'bestvideo', }, 'skip': 'The page does not exist', }, { 'url': 'https://dzen.ru/media/id/606fd806cc13cb3c58c05cf5/vot-eto-focus-dedy-morozy-na-gidrociklah-60c7c443da18892ebfe85ed7', 'info_dict': { 'id': '60c7c443da18892ebfe85ed7', 'ext': 'mp4', 'title': 'ВОТ ЭТО Focus. Деды Морозы на гидроциклах', 'description': 'md5:8684912f6086f298f8078d4af0e8a600', 'thumbnail': r're:^https://avatars\.dzeninfra\.ru/', 'uploader': 'AcademeG DailyStream', 'upload_date': '20191111', 'timestamp': 1573465585, }, 'params': {'skip_download': 'm3u8'}, }, { 'url': 'https://zen.yandex.ru/video/watch/6002240ff8b1af50bb2da5e3', 'info_dict': { 'id': '6002240ff8b1af50bb2da5e3', 'ext': 'mp4', 'title': 'Извержение вулкана из спичек: зрелищный опыт', 'description': 'md5:053ad3c61b5596d510c9a199dc8ee633', 'thumbnail': r're:^https://avatars\.dzeninfra\.ru/', 'uploader': 'TechInsider', 'timestamp': 1611378221, 'upload_date': '20210123', }, 'params': {'skip_download': 'm3u8'}, }, { 'url': 'https://dzen.ru/video/watch/6002240ff8b1af50bb2da5e3', 'info_dict': { 'id': '6002240ff8b1af50bb2da5e3', 'ext': 'mp4', 'title': 'Извержение вулкана из спичек: зрелищный опыт', 'description': 'md5:053ad3c61b5596d510c9a199dc8ee633', 'thumbnail': 're:^https://avatars.dzeninfra.ru/', 'uploader': 'TechInsider', 'upload_date': '20210123', 'timestamp': 1611378221, }, 'params': {'skip_download': 'm3u8'}, }, { 'url': 'https://zen.yandex.ru/media/id/606fd806cc13cb3c58c05cf5/novyi-samsung-fold-3-moskvich-barahlit-612f93b7f8d48e7e945792a2?from=channel&rid=2286618386.482.1630817595976.42360', 'only_matching': True, }, { 'url': 'https://dzen.ru/media/id/606fd806cc13cb3c58c05cf5/novyi-samsung-fold-3-moskvich-barahlit-612f93b7f8d48e7e945792a2?from=channel&rid=2286618386.482.1630817595976.42360', 'only_matching': True, }] def _real_extract(self, url): video_id = self._match_id(url) webpage = self._ZY_download_webpage(url, video_id) data_json = self._search_json( r'data\s*=', webpage, 'old video data', video_id, contains_pattern=r'{["\']_*serverState_*video.+}', default=None) if data_json: server_state = self._search_regex(r'(_+serverState_+video-site_[^_]+_+)', webpage, 'server state').replace('State', 'Settings') video_json = traverse_obj(data_json, (server_state, 'exportData', 'video', T(dict))) stream_urls = traverse_obj(video_json, ('video', 'streams', Ellipsis)) else: data_json = self._search_ssrData(webpage, 'video data', video_id) video_json = traverse_obj(data_json, ('ssrData', 'videoMetaResponse')) stream_urls = traverse_obj(data_json, ( 'ssrData', 'streamsResponse', 'SingleStream', Ellipsis, 'StreamInfo', Ellipsis, 'OutputStream')) def search_uploader(html): uploader = self._search_regex(r'(<a\s(?:[^>]*\s)?class\s*=\s*(?P<q>"|\')(?:[\w$.-]+\s)*card-channel-link(?:\s[\w$.-]+)*(?P=q)[^>]*>)', html, 'uploader', default='') return extract_attributes(uploader).get('aria-label') formats = [] subtitles = {} for s_url in stream_urls: ext = determine_ext(s_url) if ext == 'mpd': fmts, subs = self._extract_mpd_formats_and_subtitles(s_url, video_id, mpd_id='dash', fatal=False) formats.extend(fmts) self._merge_subtitles(subs, target=subtitles) elif ext == 'm3u8': formats.extend(self._extract_m3u8_formats(s_url, video_id, 'mp4', fatal=False)) self._sort_formats(formats) return merge_dicts({ 'id': video_id, 'formats': formats, 'subtitles': subtitles, }, traverse_obj(video_json, { 'title': (('title', T(lambda _: self._og_search_title(webpage))),), 'description': (('description', T(lambda _: self._og_search_description(webpage, default=None))),), 'duration': ((None, 'video'), 'duration', T(int_or_none)), 'view_count': ((None, 'video'), 'views', T(int_or_none)), 'timestamp': ('publicationDate', T(int_or_none)), 'uploader': (('source', 'title'), T(lambda _: search_uploader(webpage)),), 'thumbnail': (('image', T(lambda _: self._og_search_thumbnail(webpage, default=None))), T(url_or_none)), 'age_limit': ('video', 'adult', T(lambda x: 18 if x else None)), }, get_all=False), traverse_obj(data_json, { 'uploader': (('authorName', ('publisher', 'name')),), 'description': ('og', 'description'), 'thumbnail': ('og', 'imageUrl', T(url_or_none)), }, get_all=False), traverse_obj(video_json, { 'tags': ('tags', Ellipsis), })) class ZenYandexChannelIE(ZenYandexBase): _VALID_URL = r'https?://(zen\.yandex|dzen)\.ru/(?!media|video)(?:id/)?(?P<id>[a-z0-9-_]+)' _TESTS = [{ 'url': 'https://zen.yandex.ru/tok_media', 'info_dict': { 'id': 'tok_media', 'title': 'СПЕКТР', 'description': 'md5:a9e5b3c247b7fe29fd21371a428bcf56', }, 'playlist_mincount': 169, 'skip': 'The page does not exist', }, { 'url': 'https://dzen.ru/tok_media', 'info_dict': { 'id': 'tok_media', 'title': 'СПЕКТР', 'description': 'md5:a9e5b3c247b7fe29fd21371a428bcf56', }, 'playlist_mincount': 169, 'skip': 'The page does not exist', }, { 'url': 'https://zen.yandex.ru/id/606fd806cc13cb3c58c05cf5', 'info_dict': { 'id': '606fd806cc13cb3c58c05cf5', 'description': 'md5:517b7c97d8ca92e940f5af65448fd928', 'title': 'AcademeG DailyStream', }, 'playlist_mincount': 657, }, { # Test that the playlist extractor finishes extracting when the # channel has less than one page 'url': 'https://zen.yandex.ru/jony_me', 'info_dict': { 'id': 'jony_me', 'description': 'md5:a2c62b4ef5cf3e3efb13d25f61f739e1', 'title': 'JONY ', }, 'playlist_count': 18, }, { # Test that the playlist extractor finishes extracting when the # channel has more than one page of entries 'url': 'https://zen.yandex.ru/tatyanareva', 'info_dict': { 'id': 'tatyanareva', 'description': 'md5:40a1e51f174369ec3ba9d657734ac31f', 'title': 'Татьяна Рева', 'entries': 'maxcount:200', }, 'playlist_mincount': 46, }, { 'url': 'https://dzen.ru/id/606fd806cc13cb3c58c05cf5', 'info_dict': { 'id': '606fd806cc13cb3c58c05cf5', 'title': 'AcademeG DailyStream', 'description': 'md5:517b7c97d8ca92e940f5af65448fd928', }, 'playlist_mincount': 657, }] def _entries(self, item_id, server_state_json, server_settings_json): server_json = (server_state_json, server_settings_json) data = traverse_obj(server_json, (0, 'feed'), (1, 'exportData'), expected_type=lambda x: x['items'][0]) more = traverse_obj(server_json, (0, 'links', 'more'), (1, 'exportData', 'more', 'link')) next_page_id = None for page in itertools.count(1): for item in traverse_obj(data, ('items', lambda _, v: v.get('type') == 'gif')): video_id = traverse_obj( item, 'id', (('publication_id', 'publicationId'), T(lambda s: s.rsplit(':', 1)[-1])), get_all=False) or '' yield self.url_result(item['link'], ZenYandexIE.ie_key(), video_id) current_page_id = next_page_id next_page_id = traverse_obj(parse_qs(more), ('next_page_id', -1)) if not all((more, next_page_id, next_page_id != current_page_id)): break data = self._download_json(more, item_id, note='Downloading page {0}'.format(page)) more = traverse_obj(data, ('more', 'link')) def _real_extract(self, url): item_id = self._match_id(url) webpage = self._ZY_download_webpage(url, item_id) data = self._search_json( r'var\s+data\s*=', webpage, 'old channel data', item_id, contains_pattern=r'{\"__serverState__.+}', default=None) if data: server_state_json = traverse_obj(data, lambda k, _: k.startswith('__serverState__'), get_all=False) server_settings_json = traverse_obj(data, lambda k, _: k.startswith('__serverSettings__'), get_all=False) else: data = self._search_ssrData(webpage, 'channel data', item_id) server_state_json = {} server_settings_json = traverse_obj(data, ('ssrData', 'exportResponse', {'exportData': 'feedData'})) return self.playlist_result( self._entries(item_id, server_state_json, server_settings_json), item_id, traverse_obj(server_state_json, ('channel', 'source', 'title')), traverse_obj(server_state_json, ('channel', 'source', 'description'))) ``` </details> This should be appended to the code in `extractor/yandexvideo.py`
Author
Owner

@dirkf commented on GitHub (Apr 7, 2025):

Now: yt-dlp/yt-dlp#12852

@dirkf commented on GitHub (Apr 7, 2025): Now: yt-dlp/yt-dlp#12852
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/youtube-dl-ytdl-org#26754
No description provided.