Please support dzen.ru quickly

deekerman commented

2026-02-21 14:28:11 -05:00

Owner

Originally created by @aflamrip on GitHub (Dec 2, 2023).

Originally assigned to: @Revisto on GitHub.

Checklist

I'm reporting a new site support request
I've verified that I'm running youtube-dl version 2021.12.17
I've checked that all provided URLs are alive and playable in a browser
I've checked that none of provided URLs violate any copyrights
I've searched the bugtracker for similar site support requests including closed ones

Example URLs

Single video: https://dzen.ru/video/watch/6471e50e06863726828b435c
Single video: https://dzen.ru/embed/vnVEaPfaSym8?from_block=partner&from=zen&mute=0&autoplay=0&tv=0
Playlist: nothing

Description

It is a very popular Russian site and needed to be supported

Originally created by @aflamrip on GitHub (Dec 2, 2023). Originally assigned to: @Revisto on GitHub.  ## Checklist  - [x] I'm reporting a new site support request - [x] I've verified that I'm running youtube-dl version **2021.12.17** - [x] I've checked that all provided URLs are alive and playable in a browser - [x] I've checked that none of provided URLs violate any copyrights - [x] I've searched the bugtracker for similar site support requests including closed ones ## Example URLs  - Single video: https://dzen.ru/video/watch/6471e50e06863726828b435c - Single video: https://dzen.ru/embed/vnVEaPfaSym8?from_block=partner&from=zen&mute=0&autoplay=0&tv=0 - Playlist: nothing ## Description  It is a very popular Russian site and needed to be supported

deekerman added the

site-support-request

Good first issue

labels

2026-02-21 14:28:11 -05:00

deekerman commented

2026-02-21 14:28:13 -05:00

Author

Owner

@dirkf commented on GitHub (Dec 2, 2023):

Please review and complete the checklist (why is it there?).

The non-JS page seen by yt-dl contains a gigantic hydration JSON object as a parameter of a JS function that defines the page. Video links can be extracted using the traversal path 'data', ..., 0, 'videoViewer', 'items', ..., 0, 'rawStreams', 'SingleStream', ..., 'StreamInfo', ..., 'OutputStream'. 'Title' and 'Thumbnail' are available alongside 'StreamInfo'. I was able to watch the clip about the fake IKEA (why not ШведГаус?) using one of these links in mpv.

@dirkf commented on GitHub (Dec 2, 2023): Please review and complete the checklist (**why is it there?**). The non-JS page seen by yt-dl contains a gigantic hydration JSON object as a parameter of a JS function that defines the page. Video links can be extracted using the traversal path `'data', ..., 0, 'videoViewer', 'items', ..., 0, 'rawStreams', 'SingleStream', ..., 'StreamInfo', ..., 'OutputStream'`. `'Title'` and `'Thumbnail'` are available alongside `'StreamInfo'`. I was able to watch the clip about the fake IKEA (why not ШведГаус?) using one of these links in _mpv_.

deekerman commented

2026-02-21 14:28:15 -05:00

Author

Owner

@aflamrip commented on GitHub (Dec 3, 2023):

I have no experience in understanding these matters, but I hope that you will support downloading from this site, as it is a strong competitor to ok.ru and vk.com.

@aflamrip commented on GitHub (Dec 3, 2023): I have no experience in understanding these matters, but I hope that you will support downloading from this site, as it is a strong competitor to ok.ru and vk.com.

deekerman commented

2026-02-21 14:28:16 -05:00

Author

Owner

@dirkf commented on GitHub (Dec 3, 2023):

I encourage someone who's interested to take this up. Otherwise it's at the end of a long queue.

@dirkf commented on GitHub (Dec 3, 2023): I encourage someone who's interested to take this up. Otherwise it's at the end of a long queue.

deekerman commented

2026-02-21 14:28:16 -05:00

Author

Owner

@Revisto commented on GitHub (Dec 9, 2023):

Hi, can I work on this issue as my first contribution to youtube-dl?

@Revisto commented on GitHub (Dec 9, 2023): Hi, can I work on this issue as my first contribution to youtube-dl?

deekerman commented

2026-02-21 14:28:16 -05:00

Author

Owner

@dirkf commented on GitHub (Dec 9, 2023):

By all means. Are you happy that you know what to do? See #29310 in addition to the manual and FAQ.

@dirkf commented on GitHub (Dec 9, 2023): By all means. Are you happy that you know what to do? See #29310 in addition to the manual and FAQ.

deekerman commented

2026-02-21 14:28:16 -05:00

Author

Owner

@3052 commented on GitHub (Dec 11, 2023):

note this uBlock Origin rule is needed for the web client:

dzen.ru dzeninfra.ru * noop

@3052 commented on GitHub (Dec 11, 2023): note this uBlock Origin rule is needed for the web client: ~~~ dzen.ru dzeninfra.ru * noop ~~~

deekerman commented

2026-02-21 14:28:16 -05:00

Author

Owner

@aflamrip commented on GitHub (Dec 15, 2023):

I am using a peertube instance and it uses ytdl to import videos and this is the error that appears
Command failed with exit code 1: /usr/bin/python3 /var/www/peertube/storage/bin/yt-dlp --merge-output-format mp4 -f bestvideo[vcodec^=avc1][height=720]+bestaudio[ext=m4a]/bestvideo[vcodec!*=av01][vcodec!*=vp9.2][height=720]+bestaudio/bestvideo[vcodec^=avc1][height<=720]+bestaudio[ext=m4a]/bestvideo[vcodec!*=av01][vcodec!*=vp9.2]+bestaudio/best[vcodec!*=av01][vcodec!*=vp9.2]/bestvideo[ext=mp4]+bestaudio[ext=m4a]/best -o /var/www/peertube/storage/tmp/06fa3722e25f07e41b6405880136fbef4fe5dae7fe715463a14c23722933cf18-import https://dzen.ru/video/watch/651a03f748b261411c264611 ERROR: 'NoneType' object is not iterable [ZenYandex] Extracting URL: https://dzen.ru/video/watch/651a03f748b261411c264611 [ZenYandex] 651a03f748b261411c264611: Downloading webpage [ZenYandex] 651a03f748b261411c264611: Redirecting

@aflamrip commented on GitHub (Dec 15, 2023): I am using a peertube instance and it uses ytdl to import videos and this is the error that appears `Command failed with exit code 1: /usr/bin/python3 /var/www/peertube/storage/bin/yt-dlp --merge-output-format mp4 -f bestvideo[vcodec^=avc1][height=720]+bestaudio[ext=m4a]/bestvideo[vcodec!*=av01][vcodec!*=vp9.2][height=720]+bestaudio/bestvideo[vcodec^=avc1][height<=720]+bestaudio[ext=m4a]/bestvideo[vcodec!*=av01][vcodec!*=vp9.2]+bestaudio/best[vcodec!*=av01][vcodec!*=vp9.2]/bestvideo[ext=mp4]+bestaudio[ext=m4a]/best -o /var/www/peertube/storage/tmp/06fa3722e25f07e41b6405880136fbef4fe5dae7fe715463a14c23722933cf18-import https://dzen.ru/video/watch/651a03f748b261411c264611 ERROR: 'NoneType' object is not iterable [ZenYandex] Extracting URL: https://dzen.ru/video/watch/651a03f748b261411c264611 [ZenYandex] 651a03f748b261411c264611: Downloading webpage [ZenYandex] 651a03f748b261411c264611: Redirecting`

deekerman commented

2026-02-21 14:28:16 -05:00

Author

Owner

@dirkf commented on GitHub (Dec 15, 2023):

How is this relevant to youtube-dl's support dzen.ru? Please review #30839:

Command failed with exit code 1: /usr/bin/python3 /var/www/peertube/storage/bin/yt-dlp ...

"If you were actually using yt-dlp ..."

@dirkf commented on GitHub (Dec 15, 2023): How is this relevant to youtube-dl's support dzen.ru? Please review #30839: >`Command failed with exit code 1: /usr/bin/python3 /var/www/peertube/storage/bin/yt-dlp ...` "If you were actually using yt-dlp ..."

deekerman commented

2026-02-21 14:28:16 -05:00

Author

Owner

@abhimessi16 commented on GitHub (Dec 21, 2023):

Hi @dirkf, is @Revisto still working on the issue else i would like to work on it

@abhimessi16 commented on GitHub (Dec 21, 2023): Hi @dirkf, is @Revisto still working on the issue else i would like to work on it

deekerman commented

2026-02-21 14:28:16 -05:00

Author

Owner

@dirkf commented on GitHub (Dec 21, 2023):

There's no PR yet. Unless we hear otherwise, carry on.

@dirkf commented on GitHub (Dec 21, 2023): There's no PR yet. Unless we hear otherwise, carry on.

deekerman commented

2026-02-21 14:28:16 -05:00

Author

Owner

@Lil-Nugs commented on GitHub (Jan 28, 2024):

Hi, I'm working on this right now but was wondering if there's a certain way we should handle redirects and how you got the page downloaded, @dirkf? I'm running into an issue where it hits these two URL's before the final webpage with the video and JSON object:

https://sso.passport.yandex.ru/push?uuid=d738a747-82b1-432d-8536-30918ebb94aa&retpath=https%3A%2F%2Fdzen.ru%2Fvideo%2Fwatch%2F6471e50e06863726828b435c - sends back a 200
https://sso.dzen.ru/install?uuid=604f2166-bd61-4d1a-bfed-715e331e51d6 - sends back a 200

So the _download_webpage_handle function is downloading the empty page at the first page it gets redirected to since they aren't sending back 302 for redirect afterwards.

@Lil-Nugs commented on GitHub (Jan 28, 2024): Hi, I'm working on this right now but was wondering if there's a certain way we should handle redirects and how you got the page downloaded, @dirkf? I'm running into an issue where it hits these two URL's before the final webpage with the video and JSON object: * https://sso.passport.yandex.ru/push?uuid=d738a747-82b1-432d-8536-30918ebb94aa&retpath=https%3A%2F%2Fdzen.ru%2Fvideo%2Fwatch%2F6471e50e06863726828b435c - sends back a 200 * https://sso.dzen.ru/install?uuid=604f2166-bd61-4d1a-bfed-715e331e51d6 - sends back a 200 So the `_download_webpage_handle` function is downloading the empty page at the first page it gets redirected to since they aren't sending back 302 for redirect afterwards.

deekerman commented

2026-02-21 14:28:17 -05:00

Author

Owner

@Lil-Nugs commented on GitHub (Jan 28, 2024):

Hm looking at some other issues actually, seems like this could be bypassed by grabbing the cookies after visiting those URL's

@Lil-Nugs commented on GitHub (Jan 28, 2024): Hm looking at some other issues actually, seems like this could be bypassed by grabbing the cookies after visiting those URL's

deekerman commented

2026-02-21 14:28:17 -05:00

Author

Owner

@dirkf commented on GitHub (Jan 28, 2024):

Exactly.

I found that I could just download the second problem video and find the media link. Looking at the first one, it seems to go to the SSO link that you mention, presumably because it needs a login. So the first thing to try is passing cookies from a logged-in browser session that can play that video.

Getting --username ... --password ... to work would be ideal but experience indicates that today's login procedures are either too hard to implement or too transient or both. But maybe passport.yandex.ru has been handled already -- I haven't checked.

@dirkf commented on GitHub (Jan 28, 2024): Exactly. I found that I could just download the second problem video and find the media link. Looking at the first one, it seems to go to the SSO link that you mention, presumably because it needs a login. So the first thing to try is passing cookies from a logged-in browser session that can play that video. Getting `--username ... --password ...` to work would be ideal but experience indicates that today's login procedures are either too hard to implement or too transient or both. But maybe passport.yandex.ru has been handled already -- I haven't checked.

deekerman commented

2026-02-21 14:28:17 -05:00

Author

Owner

@Debojit-0 commented on GitHub (Mar 14, 2025):

@dirkf
Even let me try to contribute on this as this is my first contribution to open source project.

@Debojit-0 commented on GitHub (Mar 14, 2025): @dirkf Even let me try to contribute on this as this is my first contribution to open source project.

deekerman commented

2026-02-21 14:28:17 -05:00

Author

Owner

@dirkf commented on GitHub (Mar 14, 2025):

No-one has reported any further progress: have a go. Per https://github.com/ytdl-org/youtube-dl/issues/29724#issuecomment-891937212 and linked issue #29310, make sure to review the base class (extractor/common.py) and some other new extractors to understand the typical style and API usage.

@dirkf commented on GitHub (Mar 14, 2025): No-one has reported any further progress: have a go. Per https://github.com/ytdl-org/youtube-dl/issues/29724#issuecomment-891937212 and linked issue #29310, make sure to review the base class (`extractor/common.py`) and some other new extractors to understand the typical style and API usage.

deekerman commented

2026-02-21 14:28:17 -05:00

Author

Owner

@nolanwelch commented on GitHub (Mar 14, 2025):

@Debojit-0 I previously made an attempt at implementing this extractor, please feel free to use my code here if you find it helpful.

@nolanwelch commented on GitHub (Mar 14, 2025): @Debojit-0 I previously made an attempt at implementing this extractor, please feel free to use my code [here](https://github.com/nolanwelch/youtube-dl/blob/dzen/youtube_dl/extractor/dzenru.py) if you find it helpful.

deekerman commented

2026-02-21 14:28:17 -05:00

Author

Owner

@Debojit-0 commented on GitHub (Mar 14, 2025):

And were you able to resolve it ?

@Debojit-0 commented on GitHub (Mar 14, 2025): And were you able to resolve it ?

deekerman commented

2026-02-21 14:28:17 -05:00

Author

Owner

@nolanwelch commented on GitHub (Mar 14, 2025):

Unfortunately no. But I think it's a good start.

@nolanwelch commented on GitHub (Mar 14, 2025): Unfortunately no. But I think it's a good start.

deekerman commented

2026-02-21 14:28:17 -05:00

Author

Owner

@Debojit-0 commented on GitHub (Mar 14, 2025):

Ok fair enough its a good start for me as well as I am still in learning phase 🙂

@Debojit-0 commented on GitHub (Mar 14, 2025): Ok fair enough its a good start for me as well as I am still in learning phase 🙂

deekerman commented

2026-02-21 14:28:17 -05:00

Author

Owner

@Debojit-0 commented on GitHub (Mar 19, 2025):

@nolanwelch @dirkf
The issue is its restricting me my authentication.
Here is the summary or wht i have understood this is a pakcga ehtat helps u to download both webpage and videos from different sources and these sources are stored in the extractor folder which has multiple .py files and each file represent a different wesbites so i need to create a dzu file for this szu website which is a russian website.
So i created the class for this website in the extractor folder asnd ran the test file as well but i get the error
No video formats found!

Now this could be for many reasons

Outdated youtube-dl: The site structure might have changed, and youtube-dl needs to be updated.
Authentication Issue: The video might be behind a login wall, requiring authentication.
Region Restrictions: The video might not be available in your region.
Invalid URL: The URL might be incorrect or inaccessible.

so nay help would be great as i am new to this.I am trying to find the solution mean while if someone can help or guide me

Thankyou

@Debojit-0 commented on GitHub (Mar 19, 2025): @nolanwelch @dirkf The issue is its restricting me my authentication. Here is the summary or wht i have understood this is a pakcga ehtat helps u to download both webpage and videos from different sources and these sources are stored in the extractor folder which has multiple .py files and each file represent a different wesbites so i need to create a dzu file for this szu website which is a russian website. So i created the class for this website in the extractor folder asnd ran the test file as well but i get the error No video formats found! Now this could be for many reasons Outdated youtube-dl: The site structure might have changed, and youtube-dl needs to be updated. Authentication Issue: The video might be behind a login wall, requiring authentication. Region Restrictions: The video might not be available in your region. Invalid URL: The URL might be incorrect or inaccessible. so nay help would be great as i am new to this.I am trying to find the solution mean while if someone can help or guide me Thankyou

deekerman commented

2026-02-21 14:28:17 -05:00

Author

Owner

@dirkf commented on GitHub (Mar 19, 2025):

The debugger (import pdb; pdb.set_trace()) is your friend.

Be sure to review the links (and their linked material) that I posted above. Even the worked example from 2014 or so is still relevant.
Did you add your class to extractors/extractors.py?
Does your class inherit from InfoExtractor?
Does your class have a _VALID_URL that matches the URL of a test page?
Does it have a _real_extract() method?
Does the method extract an ID from its url argument?
Does it fetch the webpage from url and extract the media data from it?
If not, does it use some API URL to do that?
Does it populate a dictionary with at least id, title and formats values according to the spec in extractors/common.py?
Does it call _sort_formats() on the formats?
Does it return the dictionary if successful, or else raise ExtractorError?

@dirkf commented on GitHub (Mar 19, 2025): The debugger (`import pdb; pdb.set_trace()`) is your friend. * Be sure to review the links (and their linked material) that I posted above. Even the worked example from 2014 or so is still relevant. * Did you add your class to `extractors/extractors.py`? * Does your class inherit from `InfoExtractor`? * Does your class have a `_VALID_URL` that matches the URL of a test page? * Does it have a `_real_extract()` method? * Does the method extract an ID from its `url` argument? * Does it fetch the webpage from `url` and extract the media data from it? * If not, does it use some API URL to do that? * Does it populate a dictionary with at least `id`, `title` and `formats` values according to the spec in `extractors/common.py`? * Does it call `_sort_formats()` on the `formats`? * Does it return the dictionary if successful, or else raise `ExtractorError`?

deekerman commented

2026-02-21 14:28:17 -05:00

Author

Owner

@Debojit-0 commented on GitHub (Mar 23, 2025):

Did you add your class to extractors/extractors.py? --> yes
Does your class inherit from InfoExtractor? --> yes
Does your class inherit from InfoExtractor? --> yes
Does your class have a _VALID_URL that matches the URL of a test page?
--> i guess yes
Does it have a _real_extract() method? --yes
Does the method extract an ID from its url argument? --> yes not
method but i can see its extracting the url but redirecting me to
login page\Does it populate a dictionary with at least id, title and
formats values according to the spec in extractors/common.py? --> no
Does it call _sort_formats() on the formats? --> how do i check this
Does it return the dictionary if successful, or else raise
ExtractorError? --> raise ExtractorError as format not found as its
directing me to the login authentication page

here is the error

EReturns the info extractor class with the given ie_name <class
'youtube_dl.extractor.dzen.DzenIE'>
Returns the info extractor class with the given ie_name <class
'youtube_dl.extractor.dzen.DzenIE'>
[Dzen] 6471e50e06863726828b435c: Downloading webpage
webpage1
scripts var it =
{"host":"https:\u002F\u002Fsso.dzen.ru\u002Finstall?uuid=c81f47cc-9856-4de2-9ffa-ed0beda56016","retpath":"https:\u002F\u002Fdzen.ru\u002Fvideo\u002Fwatch\u002F6471e50e06863726828b435c?is_autologin_ya=true"};(function()
{ var form = document.createElement('form'); var element1 =
document.createElement('input'); var element2 =
document.createElement('input'); element1.name = 'retpath';
element1.type = 'hidden'; element1.value = it.retpath;
form.appendChild(element1); element2.name = 'container'; element2.type =
'hidden'; element2.value =
'1742744099.10153489.3OXfvFuN4Jw6zs2g.PVdwL66GVBOAVp7NlR5qKPyORf2lQwsFdqQDVUWHd3gQ1XXBC9NCvnxmL9v6LJzQ23DMBlH_jTS1PqUGNjlFmBxBOvO05rCsgF06VtUP5cfEUFQS3QVlQsoO2j6bPmtz9izK6MSsyPl6w9IppCNNK5BL97wd_zgdBUuxIs1kHIqrisvlGQqIebPxgn8_3KuY_M1cie7iIo_3fllLY8oAsGClA-uxVDy0Kb_OTOzE8NwPNe-ebSZ93rJCDkWdVC8DX6poDD8vYRj9GM3zEtkL30qm8fgZeAPiYhb6yuazED5iJ-rESeCIo4GBzVgusep004zmc_91sGp4Mt-LBnc4_O5phLHvUv58GFFjRyfHvq4s30TB4T_nHrvAkwR_ZMy2fJe2bVSjxJCAfZBMK5wbu6xiKaVPoaLjx_NQSjVERJrrbWNJqwY8JDB1Quc1u23QNuG3VHFRm1aYwTWc5L3Nr99UNn89laRMWrC6S1vX2MIl-17Vhb9aCCpedWC6mkNPtS9O5vzDZmCZmDw-p7BOR0DrqUnlnpANCXnxLnyRRigvoF6TvuuR0IkcnazTJFhrwLh4XXWTVPcyhepUO_QF7YZ0fEEncrGMo8FscZmikfYbVIe8fuAA8MCp8tBRTJ0bLXOjkPZ2nrIN3AXyB43gZ_qNFLkrm8mQPrQs14EsK-l6bkWCC6Dwx5vhKsC4lDn4ABhJI8TxOCP_XCo_hMphDhJQ__HZfvtvtgUKtDl5ONNMFBnoEHsmkpQP_pOMuVdaZ0BzvZMgysxqmhDnksvRvh03--Nh0MU01SV6wpxiVUSVz_bJmL2XcDtljpwFQDldAUAM6IxYoiGRY73yfj_pyGXIWjwyixnGnvbxJWar2jxS9YVJGuaTG-AKbhj6BuNXMe8v0FdgsVNVsqlhtXwXd5PJ-xIKlIkqRItU6bR6UOsEA-LmiEhSI2vga_4WCOisk4sn43R8aFyWTbedrBmOEKZJ9rleIpijnosaxJRdrkRyqydjlg8V1mQ-c7Fqrf6CVYjffNkSzEM99qa5Zx-oSkHMJoiAYYQQYhBxx2IVhVjfg39yZzp10nvPNRi7fG2dEi7mJ7Hva3JNJZfUv8lbxo2uCg-nQTvSW15DD8MTKgI2xZHSnvQbZ2_emA1tOnflUziLeDnZe24vBDcQVykQm-L22v3M5Bm6K0qehHeaMNog-QXR.Ufmw94PycWbuZL0Ujfu74A';
form.appendChild(element2); form.method = 'POST'; form.action = it.host;
document.body.appendChild(form); form.submit();})();
ERROR: No video formats found!; please report this issue on
https://github.com/ytdl-org/youtube-dl/issues , using the appropriate
issue template. Make sure you are using the latest version; see
https://github.com/ytdl-org/youtube-dl/#user-content-installation on
how to update. Be sure to call youtube-dl with the --verbose option and
include the complete output.
Traceback (most recent call last):
File "C:\Users\DEBOJ\OneDrive\Desktop\opensource
projects\ytdl\youtube-dl\youtube_dl\YoutubeDL.py", line 875, in wrapper
return func(self, *args, **kwargs)
File "C:\Users\DEBOJ\OneDrive\Desktop\opensource
projects\ytdl\youtube-dl\youtube_dl\YoutubeDL.py", line 982, in
__extract_info
return self.process_ie_result(ie_result, download, extra_info)
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\DEBOJ\OneDrive\Desktop\opensource
projects\ytdl\youtube-dl\youtube_dl\YoutubeDL.py", line 1016, in
process_ie_result
return self.process_video_result(ie_result, download=download)
~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\DEBOJ\OneDrive\Desktop\opensource
projects\ytdl\youtube-dl\youtube_dl\YoutubeDL.py", line 1752, in
process_video_result
raise ExtractorError('No video formats found!')
youtube_dl.utils.ExtractorError: No video formats found!; please report
this issue on https://github.com/ytdl-org/youtube-dl/issues , using the
appropriate issue template. Make sure you are using the latest version; see
https://github.com/ytdl-org/youtube-dl/#user-content-installation on
how to update. Be sure to call youtube-dl with the --verbose
option and include the complete output.

On Wed, Mar 19, 2025 at 3:45 PM dirkf @.***> wrote:

The debugger (import pdb; pdb.set_trace()) is your friend.

Be sure to review the links (and their linked material) that I
posted above. Even the worked example from 2014 or so is still relevant.

Did you add your class to extractors/extractors.py?

Does your class inherit from InfoExtractor?

Does your class have a _VALID_URL that matches the URL of a test
page?

Does it have a _real_extract() method?

Does the method extract an ID from its url argument?

Does it fetch the webpage from url and extract the media data from
it?

If not, does it use some API URL to do that?

Does it populate a dictionary with at least id, title and formats
values according to the spec in extractors/common.py?

Does it call _sort_formats() on the formats?

Does it return the dictionary if successful, or else raise
ExtractorError?

—
Reply to this email directly, view it on GitHub
https://github.com/ytdl-org/youtube-dl/issues/32654#issuecomment-2736043762,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ASNBXIOYO3VCATTT7NMBRC32VE7V3AVCNFSM6AAAAABZAEJPQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMZWGA2DGNZWGI
.
You are receiving this because you were mentioned.Message ID:
@.***>
[image: dirkf]dirkf left a comment (ytdl-org/youtube-dl#32654)
https://github.com/ytdl-org/youtube-dl/issues/32654#issuecomment-2736043762

The debugger (import pdb; pdb.set_trace()) is your friend.

Be sure to review the links (and their linked material) that I
posted above. Even the worked example from 2014 or so is still relevant.

Did you add your class to extractors/extractors.py?

Does your class inherit from InfoExtractor?

Does your class have a _VALID_URL that matches the URL of a test
page?

Does it have a _real_extract() method?

Does the method extract an ID from its url argument?

Does it fetch the webpage from url and extract the media data from
it?

If not, does it use some API URL to do that?

Does it populate a dictionary with at least id, title and formats
values according to the spec in extractors/common.py?

Does it call _sort_formats() on the formats?

Does it return the dictionary if successful, or else raise
ExtractorError?

—
Reply to this email directly, view it on GitHub
https://github.com/ytdl-org/youtube-dl/issues/32654#issuecomment-2736043762,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ASNBXIOYO3VCATTT7NMBRC32VE7V3AVCNFSM6AAAAABZAEJPQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMZWGA2DGNZWGI
.
You are receiving this because you were mentioned.Message ID:
@.***>

@Debojit-0 commented on GitHub (Mar 23, 2025): - Did you add your class to extractors/extractors.py? --> yes - Does your class inherit from InfoExtractor? --> yes - Does your class inherit from InfoExtractor? --> yes - Does your class have a _VALID_URL that matches the URL of a test page? --> i guess yes - Does it have a _real_extract() method? --yes - Does the method extract an ID from its url argument? --> yes not method but i can see its extracting the url but redirecting me to login page\Does it populate a dictionary with at least id, title and formats values according to the spec in extractors/common.py? --> no - Does it call _sort_formats() on the formats? --> how do i check this - Does it return the dictionary if successful, or else raise ExtractorError? --> raise ExtractorError as format not found as its directing me to the login authentication page here is the error EReturns the info extractor class with the given ie_name <class 'youtube_dl.extractor.dzen.DzenIE'> Returns the info extractor class with the given ie_name <class 'youtube_dl.extractor.dzen.DzenIE'> [Dzen] 6471e50e06863726828b435c: Downloading webpage webpage1 <body></body><script nonce='7065aec34050c5d5fd01729c19236188'>var it = {"host":"https:\u002F\ u002Fsso.dzen.ru\u002Finstall?uuid=c81f47cc-9856-4de2-9ffa-ed0beda56016","retpath":"https:\u002F\u002Fdzen.ru\u002Fvideo\u002Fwatch\u002F6471e50e06863726828b435c?is_autologin_ya=true"};(function() { var form = document.createElement('form'); var element1 = document.createElement('input'); var element2 = document.createElement('input'); element1.name = 'retpath'; element1.type = 'hidden'; element1.value = it.retpath; form.appendChild(element1); element2.name = 'container'; element2.type = 'hidden'; element2.value = '1742744099.10153489.3OXfvFuN4Jw6zs2g.PVdwL66GVBOAVp7NlR5qKPyORf2lQwsFdqQDVUWHd3gQ1XXBC9NCvnxmL9v6LJzQ23DMBlH_jTS1PqUGNjlFmBxBOvO05rCsgF06VtUP5cfEUFQS3QVlQsoO2j6bPmtz9izK6MSsyPl6w9IppCNNK5BL97wd_zgdBUuxIs1kHIqrisvlGQqIebPxgn8_3KuY_M1cie7iIo_3fllLY8oAsGClA-uxVDy0Kb_OTOzE8NwPNe-ebSZ93rJCDkWdVC8DX6poDD8vYRj9GM3zEtkL30qm8fgZeAPiYhb6yuazED5iJ-rESeCIo4GBzVgusep004zmc_91sGp4Mt-LBnc4_O5phLHvUv58GFFjRyfHvq4s30TB4T_nHrvAkwR_ZMy2fJe2bVSjxJCAfZBMK5wbu6xiKaVPoaLjx_NQSjVERJrrbWNJqwY8JDB1Quc1u23QNuG3VHFRm1aYwTWc5L3Nr99UNn89laRMWrC6S1vX2MIl-17Vhb9aCCpedWC6mkNPtS9O5vzDZmCZmDw-p7BOR0DrqUnlnpANCXnxLnyRRigvoF6TvuuR0IkcnazTJFhrwLh4XXWTVPcyhepUO_QF7YZ0fEEncrGMo8FscZmikfYbVIe8fuAA8MCp8tBRTJ0bLXOjkPZ2nrIN3AXyB43gZ_qNFLkrm8mQPrQs14EsK-l6bkWCC6Dwx5vhKsC4lDn4ABhJI8TxOCP_XCo_hMphDhJQ__HZfvtvtgUKtDl5ONNMFBnoEHsmkpQP_pOMuVdaZ0BzvZMgysxqmhDnksvRvh03--Nh0MU01SV6wpxiVUSVz_bJmL2XcDtljpwFQDldAUAM6IxYoiGRY73yfj_pyGXIWjwyixnGnvbxJWar2jxS9YVJGuaTG-AKbhj6BuNXMe8v0FdgsVNVsqlhtXwXd5PJ-xIKlIkqRItU6bR6UOsEA-LmiEhSI2vga_4WCOisk4sn43R8aFyWTbedrBmOEKZJ9rleIpijnosaxJRdrkRyqydjlg8V1mQ-c7Fqrf6CVYjffNkSzEM99qa5Zx-oSkHMJoiAYYQQYhBxx2IVhVjfg39yZzp10nvPNRi7fG2dEi7mJ7Hva3JNJZfUv8lbxo2uCg-nQTvSW15DD8MTKgI2xZHSnvQbZ2_emA1tOnflUziLeDnZe24vBDcQVykQm-L22v3M5Bm6K0qehHeaMNog-QXR.Ufmw94PycWbuZL0Ujfu74A'; form.appendChild(element2); form.method = 'POST'; form.action = it.host; document.body.appendChild(form); form.submit();})();</script> scripts var it = {"host":"https:\u002F\u002Fsso.dzen.ru\u002Finstall?uuid=c81f47cc-9856-4de2-9ffa-ed0beda56016","retpath":"https:\u002F\u002Fdzen.ru\u002Fvideo\u002Fwatch\u002F6471e50e06863726828b435c?is_autologin_ya=true"};(function() { var form = document.createElement('form'); var element1 = document.createElement('input'); var element2 = document.createElement('input'); element1.name = 'retpath'; element1.type = 'hidden'; element1.value = it.retpath; form.appendChild(element1); element2.name = 'container'; element2.type = 'hidden'; element2.value = '1742744099.10153489.3OXfvFuN4Jw6zs2g.PVdwL66GVBOAVp7NlR5qKPyORf2lQwsFdqQDVUWHd3gQ1XXBC9NCvnxmL9v6LJzQ23DMBlH_jTS1PqUGNjlFmBxBOvO05rCsgF06VtUP5cfEUFQS3QVlQsoO2j6bPmtz9izK6MSsyPl6w9IppCNNK5BL97wd_zgdBUuxIs1kHIqrisvlGQqIebPxgn8_3KuY_M1cie7iIo_3fllLY8oAsGClA-uxVDy0Kb_OTOzE8NwPNe-ebSZ93rJCDkWdVC8DX6poDD8vYRj9GM3zEtkL30qm8fgZeAPiYhb6yuazED5iJ-rESeCIo4GBzVgusep004zmc_91sGp4Mt-LBnc4_O5phLHvUv58GFFjRyfHvq4s30TB4T_nHrvAkwR_ZMy2fJe2bVSjxJCAfZBMK5wbu6xiKaVPoaLjx_NQSjVERJrrbWNJqwY8JDB1Quc1u23QNuG3VHFRm1aYwTWc5L3Nr99UNn89laRMWrC6S1vX2MIl-17Vhb9aCCpedWC6mkNPtS9O5vzDZmCZmDw-p7BOR0DrqUnlnpANCXnxLnyRRigvoF6TvuuR0IkcnazTJFhrwLh4XXWTVPcyhepUO_QF7YZ0fEEncrGMo8FscZmikfYbVIe8fuAA8MCp8tBRTJ0bLXOjkPZ2nrIN3AXyB43gZ_qNFLkrm8mQPrQs14EsK-l6bkWCC6Dwx5vhKsC4lDn4ABhJI8TxOCP_XCo_hMphDhJQ__HZfvtvtgUKtDl5ONNMFBnoEHsmkpQP_pOMuVdaZ0BzvZMgysxqmhDnksvRvh03--Nh0MU01SV6wpxiVUSVz_bJmL2XcDtljpwFQDldAUAM6IxYoiGRY73yfj_pyGXIWjwyixnGnvbxJWar2jxS9YVJGuaTG-AKbhj6BuNXMe8v0FdgsVNVsqlhtXwXd5PJ-xIKlIkqRItU6bR6UOsEA-LmiEhSI2vga_4WCOisk4sn43R8aFyWTbedrBmOEKZJ9rleIpijnosaxJRdrkRyqydjlg8V1mQ-c7Fqrf6CVYjffNkSzEM99qa5Zx-oSkHMJoiAYYQQYhBxx2IVhVjfg39yZzp10nvPNRi7fG2dEi7mJ7Hva3JNJZfUv8lbxo2uCg-nQTvSW15DD8MTKgI2xZHSnvQbZ2_emA1tOnflUziLeDnZe24vBDcQVykQm-L22v3M5Bm6K0qehHeaMNog-QXR.Ufmw94PycWbuZL0Ujfu74A'; form.appendChild(element2); form.method = 'POST'; form.action = it.host; document.body.appendChild(form); form.submit();})(); ERROR: No video formats found!; please report this issue on https://github.com/ytdl-org/youtube-dl/issues , using the appropriate issue template. Make sure you are using the latest version; see https://github.com/ytdl-org/youtube-dl/#user-content-installation on how to update. Be sure to call youtube-dl with the --verbose option and include the complete output. Traceback (most recent call last): File "C:\Users\DEBOJ\OneDrive\Desktop\opensource projects\ytdl\youtube-dl\youtube_dl\YoutubeDL.py", line 875, in wrapper return func(self, *args, **kwargs) File "C:\Users\DEBOJ\OneDrive\Desktop\opensource projects\ytdl\youtube-dl\youtube_dl\YoutubeDL.py", line 982, in __extract_info return self.process_ie_result(ie_result, download, extra_info) ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\DEBOJ\OneDrive\Desktop\opensource projects\ytdl\youtube-dl\youtube_dl\YoutubeDL.py", line 1016, in process_ie_result return self.process_video_result(ie_result, download=download) ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\DEBOJ\OneDrive\Desktop\opensource projects\ytdl\youtube-dl\youtube_dl\YoutubeDL.py", line 1752, in process_video_result raise ExtractorError('No video formats found!') youtube_dl.utils.ExtractorError: No video formats found!; please report this issue on https://github.com/ytdl-org/youtube-dl/issues , using the appropriate issue template. Make sure you are using the latest version; see https://github.com/ytdl-org/youtube-dl/#user-content-installation on how to update. Be sure to call youtube-dl with the --verbose option and include the complete output. On Wed, Mar 19, 2025 at 3:45 PM dirkf ***@***.***> wrote: > The debugger (import pdb; pdb.set_trace()) is your friend. > > - Be sure to review the links (and their linked material) that I > posted above. Even the worked example from 2014 or so is still relevant. > - Did you add your class to extractors/extractors.py? > - Does your class inherit from InfoExtractor? > - Does your class have a _VALID_URL that matches the URL of a test > page? > - Does it have a _real_extract() method? > - Does the method extract an ID from its url argument? > - Does it fetch the webpage from url and extract the media data from > it? > - If not, does it use some API URL to do that? > - Does it populate a dictionary with at least id, title and formats > values according to the spec in extractors/common.py? > - Does it call _sort_formats() on the formats? > - Does it return the dictionary if successful, or else raise > ExtractorError? > > — > Reply to this email directly, view it on GitHub > <https://github.com/ytdl-org/youtube-dl/issues/32654#issuecomment-2736043762>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ASNBXIOYO3VCATTT7NMBRC32VE7V3AVCNFSM6AAAAABZAEJPQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMZWGA2DGNZWGI> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> > [image: dirkf]*dirkf* left a comment (ytdl-org/youtube-dl#32654) > <https://github.com/ytdl-org/youtube-dl/issues/32654#issuecomment-2736043762> > > The debugger (import pdb; pdb.set_trace()) is your friend. > > - Be sure to review the links (and their linked material) that I > posted above. Even the worked example from 2014 or so is still relevant. > - Did you add your class to extractors/extractors.py? > - Does your class inherit from InfoExtractor? > - Does your class have a _VALID_URL that matches the URL of a test > page? > - Does it have a _real_extract() method? > - Does the method extract an ID from its url argument? > - Does it fetch the webpage from url and extract the media data from > it? > - If not, does it use some API URL to do that? > - Does it populate a dictionary with at least id, title and formats > values according to the spec in extractors/common.py? > - Does it call _sort_formats() on the formats? > - Does it return the dictionary if successful, or else raise > ExtractorError? > > — > Reply to this email directly, view it on GitHub > <https://github.com/ytdl-org/youtube-dl/issues/32654#issuecomment-2736043762>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ASNBXIOYO3VCATTT7NMBRC32VE7V3AVCNFSM6AAAAABZAEJPQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMZWGA2DGNZWGI> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >

deekerman commented

2026-02-21 14:28:17 -05:00

Author

Owner

@dirkf commented on GitHub (Mar 23, 2025):

Aha! When I first looked at this, redirection to the SSO login wasn't a problem.

What you need to do is pull the ZenYandex stuff from yt-dlp, because dzen.ru and zen.yandex.ru appear to work in the same way. The SSO login can normally be bypassed by following the redirection in the interstitial JS. In this WIP version that I found, _ZY_download() does this:

ZenYandex back-port with initial dzen.ru support

class ZenYandexBase(InfoExtractor):
    def _ZY_download_webpage(self, url, item_id, *args, **kwargs):
        #                               (url_or_request, video_id, note=None, ...
        webpage = self._download_webpage(url, item_id, *args, **kwargs)
        redirect = self._search_json(r'var\s+it\s*=', webpage, 'redirect', item_id, default={}).get('retpath')
        if redirect:
            item_id = self._match_id(redirect)
            note = 'Redirecting'
            if len(args) > 0:
                args = (note,) + args[1:]
            else:
                kwargs['note'] = note
                kwargs = compat_kwargs(kwargs)
            webpage = self._download_webpage(redirect, item_id, *args, **kwargs)
        return webpage

    def _search_ssrData(self, webpage, name, item_id):
        return self._search_json(
            # initial pattern can appear many times, so disambiguate
            r'var\s+_params\s*=\s*\((?=\s*\{\s*"ssrData")', webpage,
            name, item_id)


class ZenYandexIE(ZenYandexBase):
    _VALID_URL = r'https?://(zen\.yandex|dzen)\.ru(?:/video)?/(media|watch)/(?:(?:id/[^/]+/|[^/]+/)(?:[a-z0-9-]+)-)?(?P<id>[a-z0-9-]+)'
    _TESTS = [{
        'url': 'https://zen.yandex.ru/media/id/606fd806cc13cb3c58c05cf5/vot-eto-focus-dedy-morozy-na-gidrociklah-60c7c443da18892ebfe85ed7',
        'info_dict': {
            'id': '60c7c443da18892ebfe85ed7',
            'ext': 'mp4',
            'title': 'ВОТ ЭТО Focus. Деды Морозы на гидроциклах',
            'description': 'md5:f3db3d995763b9bbb7b56d4ccdedea89',
            'thumbnail': 're:^https://avatars.dzeninfra.ru/',
            'uploader': 'AcademeG DailyStream',
        },
        'params': {
            'skip_download': 'm3u8',
            'format': 'bestvideo',
        },
        'skip': 'The page does not exist',
    }, {
        'url': 'https://dzen.ru/media/id/606fd806cc13cb3c58c05cf5/vot-eto-focus-dedy-morozy-na-gidrociklah-60c7c443da18892ebfe85ed7',
        'info_dict': {
            'id': '60c7c443da18892ebfe85ed7',
            'ext': 'mp4',
            'title': 'ВОТ ЭТО Focus. Деды Морозы на гидроциклах',
            'description': 'md5:8684912f6086f298f8078d4af0e8a600',
            'thumbnail': r're:^https://avatars\.dzeninfra\.ru/',
            'uploader': 'AcademeG DailyStream',
            'upload_date': '20191111',
            'timestamp': 1573465585,
        },
        'params': {'skip_download': 'm3u8'},
    }, {
        'url': 'https://zen.yandex.ru/video/watch/6002240ff8b1af50bb2da5e3',
        'info_dict': {
            'id': '6002240ff8b1af50bb2da5e3',
            'ext': 'mp4',
            'title': 'Извержение вулкана из спичек: зрелищный опыт',
            'description': 'md5:053ad3c61b5596d510c9a199dc8ee633',
            'thumbnail': r're:^https://avatars\.dzeninfra\.ru/',
            'uploader': 'TechInsider',
            'timestamp': 1611378221,
            'upload_date': '20210123',
        },
        'params': {'skip_download': 'm3u8'},
    }, {
        'url': 'https://dzen.ru/video/watch/6002240ff8b1af50bb2da5e3',
        'info_dict': {
            'id': '6002240ff8b1af50bb2da5e3',
            'ext': 'mp4',
            'title': 'Извержение вулкана из спичек: зрелищный опыт',
            'description': 'md5:053ad3c61b5596d510c9a199dc8ee633',
            'thumbnail': 're:^https://avatars.dzeninfra.ru/',
            'uploader': 'TechInsider',
            'upload_date': '20210123',
            'timestamp': 1611378221,
        },
        'params': {'skip_download': 'm3u8'},
    }, {
        'url': 'https://zen.yandex.ru/media/id/606fd806cc13cb3c58c05cf5/novyi-samsung-fold-3-moskvich-barahlit-612f93b7f8d48e7e945792a2?from=channel&rid=2286618386.482.1630817595976.42360',
        'only_matching': True,
    }, {
        'url': 'https://dzen.ru/media/id/606fd806cc13cb3c58c05cf5/novyi-samsung-fold-3-moskvich-barahlit-612f93b7f8d48e7e945792a2?from=channel&rid=2286618386.482.1630817595976.42360',
        'only_matching': True,
    }]

    def _real_extract(self, url):
        video_id = self._match_id(url)
        webpage = self._ZY_download_webpage(url, video_id)
        data_json = self._search_json(
            r'data\s*=', webpage, 'old video data', video_id, contains_pattern=r'{["\']_*serverState_*video.+}', default=None)
        if data_json:
            server_state = self._search_regex(r'(_+serverState_+video-site_[^_]+_+)',
                                              webpage, 'server state').replace('State', 'Settings')
            video_json = traverse_obj(data_json, (server_state, 'exportData', 'video', T(dict)))
            stream_urls = traverse_obj(video_json, ('video', 'streams', Ellipsis))
        else:
            data_json = self._search_ssrData(webpage, 'video data', video_id)
            video_json = traverse_obj(data_json, ('ssrData', 'videoMetaResponse'))
            stream_urls = traverse_obj(data_json, (
                'ssrData', 'streamsResponse', 'SingleStream', Ellipsis,
                'StreamInfo', Ellipsis, 'OutputStream'))

        def search_uploader(html):
            uploader = self._search_regex(r'(<a\s(?:[^>]*\s)?class\s*=\s*(?P<q>"|\')(?:[\w$.-]+\s)*card-channel-link(?:\s[\w$.-]+)*(?P=q)[^>]*>)',
                                          html, 'uploader', default='')
            return extract_attributes(uploader).get('aria-label')

        formats = []
        subtitles = {}
        for s_url in stream_urls:
            ext = determine_ext(s_url)
            if ext == 'mpd':
                fmts, subs = self._extract_mpd_formats_and_subtitles(s_url, video_id, mpd_id='dash', fatal=False)
                formats.extend(fmts)
                self._merge_subtitles(subs, target=subtitles)
            elif ext == 'm3u8':
                formats.extend(self._extract_m3u8_formats(s_url, video_id, 'mp4', fatal=False))
        self._sort_formats(formats)

        return merge_dicts({
            'id': video_id,
            'formats': formats,
            'subtitles': subtitles,
        }, traverse_obj(video_json, {
            'title': (('title', T(lambda _: self._og_search_title(webpage))),),
            'description': (('description',
                            T(lambda _: self._og_search_description(webpage, default=None))),),
            'duration': ((None, 'video'), 'duration', T(int_or_none)),
            'view_count': ((None, 'video'), 'views', T(int_or_none)),
            'timestamp': ('publicationDate', T(int_or_none)),
            'uploader': (('source', 'title'), T(lambda _: search_uploader(webpage)),),
            'thumbnail': (('image',
                           T(lambda _: self._og_search_thumbnail(webpage, default=None))),
                          T(url_or_none)),
            'age_limit': ('video', 'adult', T(lambda x: 18 if x else None)),
        }, get_all=False), traverse_obj(data_json, {
            'uploader': (('authorName', ('publisher', 'name')),),
            'description': ('og', 'description'),
            'thumbnail': ('og', 'imageUrl', T(url_or_none)),
        }, get_all=False), traverse_obj(video_json, {
            'tags': ('tags', Ellipsis),
        }))


class ZenYandexChannelIE(ZenYandexBase):
    _VALID_URL = r'https?://(zen\.yandex|dzen)\.ru/(?!media|video)(?:id/)?(?P<id>[a-z0-9-_]+)'
    _TESTS = [{
        'url': 'https://zen.yandex.ru/tok_media',
        'info_dict': {
            'id': 'tok_media',
            'title': 'СПЕКТР',
            'description': 'md5:a9e5b3c247b7fe29fd21371a428bcf56',
        },
        'playlist_mincount': 169,
        'skip': 'The page does not exist',
    }, {
        'url': 'https://dzen.ru/tok_media',
        'info_dict': {
            'id': 'tok_media',
            'title': 'СПЕКТР',
            'description': 'md5:a9e5b3c247b7fe29fd21371a428bcf56',
        },
        'playlist_mincount': 169,
        'skip': 'The page does not exist',
    }, {
        'url': 'https://zen.yandex.ru/id/606fd806cc13cb3c58c05cf5',
        'info_dict': {
            'id': '606fd806cc13cb3c58c05cf5',
            'description': 'md5:517b7c97d8ca92e940f5af65448fd928',
            'title': 'AcademeG DailyStream',
        },
        'playlist_mincount': 657,
    }, {
        # Test that the playlist extractor finishes extracting when the
        # channel has less than one page
        'url': 'https://zen.yandex.ru/jony_me',
        'info_dict': {
            'id': 'jony_me',
            'description': 'md5:a2c62b4ef5cf3e3efb13d25f61f739e1',
            'title': 'JONY ',
        },
        'playlist_count': 18,
    }, {
        # Test that the playlist extractor finishes extracting when the
        # channel has more than one page of entries
        'url': 'https://zen.yandex.ru/tatyanareva',
        'info_dict': {
            'id': 'tatyanareva',
            'description': 'md5:40a1e51f174369ec3ba9d657734ac31f',
            'title': 'Татьяна Рева',
            'entries': 'maxcount:200',
        },
        'playlist_mincount': 46,
    }, {
        'url': 'https://dzen.ru/id/606fd806cc13cb3c58c05cf5',
        'info_dict': {
            'id': '606fd806cc13cb3c58c05cf5',
            'title': 'AcademeG DailyStream',
            'description': 'md5:517b7c97d8ca92e940f5af65448fd928',
        },
        'playlist_mincount': 657,
    }]

    def _entries(self, item_id, server_state_json, server_settings_json):
        server_json = (server_state_json, server_settings_json)
        data = traverse_obj(server_json, (0, 'feed'), (1, 'exportData'),
                            expected_type=lambda x: x['items'][0])

        more = traverse_obj(server_json, (0, 'links', 'more'), (1, 'exportData', 'more', 'link'))

        next_page_id = None
        for page in itertools.count(1):
            for item in traverse_obj(data, ('items', lambda _, v: v.get('type') == 'gif')):
                video_id = traverse_obj(
                    item, 'id',
                    (('publication_id', 'publicationId'), T(lambda s: s.rsplit(':', 1)[-1])),
                    get_all=False) or ''
                yield self.url_result(item['link'], ZenYandexIE.ie_key(), video_id)

            current_page_id = next_page_id
            next_page_id = traverse_obj(parse_qs(more), ('next_page_id', -1))
            if not all((more, next_page_id, next_page_id != current_page_id)):
                break

            data = self._download_json(more, item_id, note='Downloading page {0}'.format(page))
            more = traverse_obj(data, ('more', 'link'))

    def _real_extract(self, url):
        item_id = self._match_id(url)
        webpage = self._ZY_download_webpage(url, item_id)
        data = self._search_json(
            r'var\s+data\s*=', webpage, 'old channel data', item_id, contains_pattern=r'{\"__serverState__.+}', default=None)
        if data:
            server_state_json = traverse_obj(data, lambda k, _: k.startswith('__serverState__'), get_all=False)
            server_settings_json = traverse_obj(data, lambda k, _: k.startswith('__serverSettings__'), get_all=False)
        else:
            data = self._search_ssrData(webpage, 'channel data', item_id)
            server_state_json = {}
            server_settings_json = traverse_obj(data, ('ssrData', 'exportResponse', {'exportData': 'feedData'}))

        return self.playlist_result(
            self._entries(item_id, server_state_json, server_settings_json),
            item_id, traverse_obj(server_state_json, ('channel', 'source', 'title')),
            traverse_obj(server_state_json, ('channel', 'source', 'description')))

This should be appended to the code in extractor/yandexvideo.py

@dirkf commented on GitHub (Mar 23, 2025): Aha! When I first looked at this, redirection to the SSO login wasn't a problem. What you need to do is pull the ZenYandex stuff from _yt-dlp_, because dzen.ru and zen.yandex.ru appear to work in the same way. The SSO login can normally be bypassed by following the redirection in the interstitial JS. In this WIP version that I found, `_ZY_download()` does this: <details><summary>ZenYandex back-port with initial dzen.ru support</summary> ```py class ZenYandexBase(InfoExtractor): def _ZY_download_webpage(self, url, item_id, *args, **kwargs): # (url_or_request, video_id, note=None, ... webpage = self._download_webpage(url, item_id, *args, **kwargs) redirect = self._search_json(r'var\s+it\s*=', webpage, 'redirect', item_id, default={}).get('retpath') if redirect: item_id = self._match_id(redirect) note = 'Redirecting' if len(args) > 0: args = (note,) + args[1:] else: kwargs['note'] = note kwargs = compat_kwargs(kwargs) webpage = self._download_webpage(redirect, item_id, *args, **kwargs) return webpage def _search_ssrData(self, webpage, name, item_id): return self._search_json( # initial pattern can appear many times, so disambiguate r'var\s+_params\s*=\s*\((?=\s*\{\s*"ssrData")', webpage, name, item_id) class ZenYandexIE(ZenYandexBase): _VALID_URL = r'https?://(zen\.yandex|dzen)\.ru(?:/video)?/(media|watch)/(?:(?:id/[^/]+/|[^/]+/)(?:[a-z0-9-]+)-)?(?P<id>[a-z0-9-]+)' _TESTS = [{ 'url': 'https://zen.yandex.ru/media/id/606fd806cc13cb3c58c05cf5/vot-eto-focus-dedy-morozy-na-gidrociklah-60c7c443da18892ebfe85ed7', 'info_dict': { 'id': '60c7c443da18892ebfe85ed7', 'ext': 'mp4', 'title': 'ВОТ ЭТО Focus. Деды Морозы на гидроциклах', 'description': 'md5:f3db3d995763b9bbb7b56d4ccdedea89', 'thumbnail': 're:^https://avatars.dzeninfra.ru/', 'uploader': 'AcademeG DailyStream', }, 'params': { 'skip_download': 'm3u8', 'format': 'bestvideo', }, 'skip': 'The page does not exist', }, { 'url': 'https://dzen.ru/media/id/606fd806cc13cb3c58c05cf5/vot-eto-focus-dedy-morozy-na-gidrociklah-60c7c443da18892ebfe85ed7', 'info_dict': { 'id': '60c7c443da18892ebfe85ed7', 'ext': 'mp4', 'title': 'ВОТ ЭТО Focus. Деды Морозы на гидроциклах', 'description': 'md5:8684912f6086f298f8078d4af0e8a600', 'thumbnail': r're:^https://avatars\.dzeninfra\.ru/', 'uploader': 'AcademeG DailyStream', 'upload_date': '20191111', 'timestamp': 1573465585, }, 'params': {'skip_download': 'm3u8'}, }, { 'url': 'https://zen.yandex.ru/video/watch/6002240ff8b1af50bb2da5e3', 'info_dict': { 'id': '6002240ff8b1af50bb2da5e3', 'ext': 'mp4', 'title': 'Извержение вулкана из спичек: зрелищный опыт', 'description': 'md5:053ad3c61b5596d510c9a199dc8ee633', 'thumbnail': r're:^https://avatars\.dzeninfra\.ru/', 'uploader': 'TechInsider', 'timestamp': 1611378221, 'upload_date': '20210123', }, 'params': {'skip_download': 'm3u8'}, }, { 'url': 'https://dzen.ru/video/watch/6002240ff8b1af50bb2da5e3', 'info_dict': { 'id': '6002240ff8b1af50bb2da5e3', 'ext': 'mp4', 'title': 'Извержение вулкана из спичек: зрелищный опыт', 'description': 'md5:053ad3c61b5596d510c9a199dc8ee633', 'thumbnail': 're:^https://avatars.dzeninfra.ru/', 'uploader': 'TechInsider', 'upload_date': '20210123', 'timestamp': 1611378221, }, 'params': {'skip_download': 'm3u8'}, }, { 'url': 'https://zen.yandex.ru/media/id/606fd806cc13cb3c58c05cf5/novyi-samsung-fold-3-moskvich-barahlit-612f93b7f8d48e7e945792a2?from=channel&rid=2286618386.482.1630817595976.42360', 'only_matching': True, }, { 'url': 'https://dzen.ru/media/id/606fd806cc13cb3c58c05cf5/novyi-samsung-fold-3-moskvich-barahlit-612f93b7f8d48e7e945792a2?from=channel&rid=2286618386.482.1630817595976.42360', 'only_matching': True, }] def _real_extract(self, url): video_id = self._match_id(url) webpage = self._ZY_download_webpage(url, video_id) data_json = self._search_json( r'data\s*=', webpage, 'old video data', video_id, contains_pattern=r'{["\']_*serverState_*video.+}', default=None) if data_json: server_state = self._search_regex(r'(_+serverState_+video-site_[^_]+_+)', webpage, 'server state').replace('State', 'Settings') video_json = traverse_obj(data_json, (server_state, 'exportData', 'video', T(dict))) stream_urls = traverse_obj(video_json, ('video', 'streams', Ellipsis)) else: data_json = self._search_ssrData(webpage, 'video data', video_id) video_json = traverse_obj(data_json, ('ssrData', 'videoMetaResponse')) stream_urls = traverse_obj(data_json, ( 'ssrData', 'streamsResponse', 'SingleStream', Ellipsis, 'StreamInfo', Ellipsis, 'OutputStream')) def search_uploader(html): uploader = self._search_regex(r'(<a\s(?:[^>]*\s)?class\s*=\s*(?P<q>"|\')(?:[\w$.-]+\s)*card-channel-link(?:\s[\w$.-]+)*(?P=q)[^>]*>)', html, 'uploader', default='') return extract_attributes(uploader).get('aria-label') formats = [] subtitles = {} for s_url in stream_urls: ext = determine_ext(s_url) if ext == 'mpd': fmts, subs = self._extract_mpd_formats_and_subtitles(s_url, video_id, mpd_id='dash', fatal=False) formats.extend(fmts) self._merge_subtitles(subs, target=subtitles) elif ext == 'm3u8': formats.extend(self._extract_m3u8_formats(s_url, video_id, 'mp4', fatal=False)) self._sort_formats(formats) return merge_dicts({ 'id': video_id, 'formats': formats, 'subtitles': subtitles, }, traverse_obj(video_json, { 'title': (('title', T(lambda _: self._og_search_title(webpage))),), 'description': (('description', T(lambda _: self._og_search_description(webpage, default=None))),), 'duration': ((None, 'video'), 'duration', T(int_or_none)), 'view_count': ((None, 'video'), 'views', T(int_or_none)), 'timestamp': ('publicationDate', T(int_or_none)), 'uploader': (('source', 'title'), T(lambda _: search_uploader(webpage)),), 'thumbnail': (('image', T(lambda _: self._og_search_thumbnail(webpage, default=None))), T(url_or_none)), 'age_limit': ('video', 'adult', T(lambda x: 18 if x else None)), }, get_all=False), traverse_obj(data_json, { 'uploader': (('authorName', ('publisher', 'name')),), 'description': ('og', 'description'), 'thumbnail': ('og', 'imageUrl', T(url_or_none)), }, get_all=False), traverse_obj(video_json, { 'tags': ('tags', Ellipsis), })) class ZenYandexChannelIE(ZenYandexBase): _VALID_URL = r'https?://(zen\.yandex|dzen)\.ru/(?!media|video)(?:id/)?(?P<id>[a-z0-9-_]+)' _TESTS = [{ 'url': 'https://zen.yandex.ru/tok_media', 'info_dict': { 'id': 'tok_media', 'title': 'СПЕКТР', 'description': 'md5:a9e5b3c247b7fe29fd21371a428bcf56', }, 'playlist_mincount': 169, 'skip': 'The page does not exist', }, { 'url': 'https://dzen.ru/tok_media', 'info_dict': { 'id': 'tok_media', 'title': 'СПЕКТР', 'description': 'md5:a9e5b3c247b7fe29fd21371a428bcf56', }, 'playlist_mincount': 169, 'skip': 'The page does not exist', }, { 'url': 'https://zen.yandex.ru/id/606fd806cc13cb3c58c05cf5', 'info_dict': { 'id': '606fd806cc13cb3c58c05cf5', 'description': 'md5:517b7c97d8ca92e940f5af65448fd928', 'title': 'AcademeG DailyStream', }, 'playlist_mincount': 657, }, { # Test that the playlist extractor finishes extracting when the # channel has less than one page 'url': 'https://zen.yandex.ru/jony_me', 'info_dict': { 'id': 'jony_me', 'description': 'md5:a2c62b4ef5cf3e3efb13d25f61f739e1', 'title': 'JONY ', }, 'playlist_count': 18, }, { # Test that the playlist extractor finishes extracting when the # channel has more than one page of entries 'url': 'https://zen.yandex.ru/tatyanareva', 'info_dict': { 'id': 'tatyanareva', 'description': 'md5:40a1e51f174369ec3ba9d657734ac31f', 'title': 'Татьяна Рева', 'entries': 'maxcount:200', }, 'playlist_mincount': 46, }, { 'url': 'https://dzen.ru/id/606fd806cc13cb3c58c05cf5', 'info_dict': { 'id': '606fd806cc13cb3c58c05cf5', 'title': 'AcademeG DailyStream', 'description': 'md5:517b7c97d8ca92e940f5af65448fd928', }, 'playlist_mincount': 657, }] def _entries(self, item_id, server_state_json, server_settings_json): server_json = (server_state_json, server_settings_json) data = traverse_obj(server_json, (0, 'feed'), (1, 'exportData'), expected_type=lambda x: x['items'][0]) more = traverse_obj(server_json, (0, 'links', 'more'), (1, 'exportData', 'more', 'link')) next_page_id = None for page in itertools.count(1): for item in traverse_obj(data, ('items', lambda _, v: v.get('type') == 'gif')): video_id = traverse_obj( item, 'id', (('publication_id', 'publicationId'), T(lambda s: s.rsplit(':', 1)[-1])), get_all=False) or '' yield self.url_result(item['link'], ZenYandexIE.ie_key(), video_id) current_page_id = next_page_id next_page_id = traverse_obj(parse_qs(more), ('next_page_id', -1)) if not all((more, next_page_id, next_page_id != current_page_id)): break data = self._download_json(more, item_id, note='Downloading page {0}'.format(page)) more = traverse_obj(data, ('more', 'link')) def _real_extract(self, url): item_id = self._match_id(url) webpage = self._ZY_download_webpage(url, item_id) data = self._search_json( r'var\s+data\s*=', webpage, 'old channel data', item_id, contains_pattern=r'{\"__serverState__.+}', default=None) if data: server_state_json = traverse_obj(data, lambda k, _: k.startswith('__serverState__'), get_all=False) server_settings_json = traverse_obj(data, lambda k, _: k.startswith('__serverSettings__'), get_all=False) else: data = self._search_ssrData(webpage, 'channel data', item_id) server_state_json = {} server_settings_json = traverse_obj(data, ('ssrData', 'exportResponse', {'exportData': 'feedData'})) return self.playlist_result( self._entries(item_id, server_state_json, server_settings_json), item_id, traverse_obj(server_state_json, ('channel', 'source', 'title')), traverse_obj(server_state_json, ('channel', 'source', 'description'))) ``` </details> This should be appended to the code in `extractor/yandexvideo.py`

deekerman commented

2026-02-21 14:28:18 -05:00

Author

Owner

@dirkf commented on GitHub (Apr 7, 2025):

Now: yt-dlp/yt-dlp#12852

@dirkf commented on GitHub (Apr 7, 2025): Now: yt-dlp/yt-dlp#12852

Rows
Columns

Please support dzen.ru quickly #26754

Checklist

Example URLs

Description