--cookies option, get cookies dynamically #26817

Closed
opened 2026-02-21 14:30:00 -05:00 by deekerman · 12 comments
Owner

Originally created by @Iridium-Lo on GitHub (Feb 25, 2024).

Checklist

  • I'm reporting a site feature request
  • I've verified that I'm running youtube-dl version 2021.12.17
  • I've searched the bugtracker for similar site feature requests including closed ones

Description

When having to use the --cookies option for sites with cloud fare protection:

  • rather than having users install a third party extension (stated in guide) then manually run it each time they need to download

Suggestion

  • get the cookies dynamically, use curl as it gets cookies in netscape format, otherwise you have to convert to netscape which takes some work
  • capture the cookie in a variable or something, don't write to file (more work)

I done this for a script I use, using curl here is a module from it:

IFS=$'\n'

getCookies() {
    curl $site \
      --silent \
      --output /dev/null \
      --user-agent $userAgent \
      --cookie-jar ~/cookies.txt 
}

ytdl() {
    youtube-dl \
      --no-part \
      --no-check-certificate \
      --cookies ~/cookies.txt \
      --user-agent $userAgent \
      --download-archive arc.txt \
      $@
}

downloadSimultaneously() {
    local userAgent site urlArray
    IFS=$'\n'
    userAgent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:123.0) Gecko/20100101 Firefox/123.0'
    site=$1
    urlArray=($@)

    getCookies $site 
    
    parallel -j 0 \
      ytdl ::: ${urlArray[@]}
}

export -f ytdl

For Python something like this:

from requests import *

headers = {
    "User-Agent": #site from download url
}

cookies = get('site from download url', headers=headers)
Originally created by @Iridium-Lo on GitHub (Feb 25, 2024). ## Checklist - [x] I'm reporting a site feature request - [x] I've verified that I'm running youtube-dl version **2021.12.17** - [x] I've searched the bugtracker for similar site feature requests including closed ones ## Description When having to use the `--cookies` option for sites with cloud fare protection: - rather than having users install a third party extension (stated in guide) then manually run it each time they need to download ## Suggestion - get the cookies dynamically, use `curl` as it gets cookies in netscape format, otherwise you have to convert to netscape which takes some work - capture the cookie in a variable or something, don't write to file (more work) I done this for a script I use, using `curl` here is a module from it: ```bash IFS=$'\n' getCookies() { curl $site \ --silent \ --output /dev/null \ --user-agent $userAgent \ --cookie-jar ~/cookies.txt } ytdl() { youtube-dl \ --no-part \ --no-check-certificate \ --cookies ~/cookies.txt \ --user-agent $userAgent \ --download-archive arc.txt \ $@ } downloadSimultaneously() { local userAgent site urlArray IFS=$'\n' userAgent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:123.0) Gecko/20100101 Firefox/123.0' site=$1 urlArray=($@) getCookies $site parallel -j 0 \ ytdl ::: ${urlArray[@]} } export -f ytdl ``` For Python something like this: ```python from requests import * headers = { "User-Agent": #site from download url } cookies = get('site from download url', headers=headers) ```
Author
Owner

@dirkf commented on GitHub (Feb 25, 2024):

If you can just replay cookies set by visiting some URL with no user interaction, we can easily do that in the extractor itself.

@dirkf commented on GitHub (Feb 25, 2024): If you can just replay cookies set by visiting some URL with no user interaction, we can easily do that in the extractor itself.
Author
Owner

@Iridium-Lo commented on GitHub (Feb 25, 2024):

I'm not sure what you mean by that?

getCookies gets the new/updated cookies each time downloadSimulataneously is executed

@Iridium-Lo commented on GitHub (Feb 25, 2024): I'm not sure what you mean by that? `getCookies` gets the new/updated cookies each time `downloadSimulataneously` is executed
Author
Owner

@dirkf commented on GitHub (Feb 25, 2024):

When yt-dl receives a Set-Cookie header from the site the cookie is stashed in a Python cookielib/http.cookiejar CookieJar accessible to extractor code as the cookiejar attribute of the extractor. The file specified by --cookies ... is updated with the received cookie, and could be passed to a subsequent yt-dl invocation. So yt-dl already includes the same functionality that curl offers (it doesn't have a --download-archive ... option, does it?).

Sometimes a site that rejects requests or redirects requests to a captcha page will send authorisation cookies to bypass this blockage if a specific site page or API URL is visited. An extractor for the site can do that as part of its _real_initialize() method.

@dirkf commented on GitHub (Feb 25, 2024): When yt-dl receives a `Set-Cookie` header from the site the cookie is stashed in a Python cookielib/http.cookiejar `CookieJar` accessible to extractor code as the `cookiejar` attribute of the extractor. The file specified by `--cookies ...` is updated with the received cookie, and could be passed to a subsequent yt-dl invocation. So yt-dl already includes the same functionality that _curl_ offers (it doesn't have a `--download-archive ...` option, does it?). Sometimes a site that rejects requests or redirects requests to a captcha page will send authorisation cookies to bypass this blockage if a specific site page or API URL is visited. An extractor for the site can do that as part of its `_real_initialize()` method.
Author
Owner

@Iridium-Lo commented on GitHub (Feb 27, 2024):

oops, obviously curl doesn't have that option, I'll edit that.

So the option --cookies option doesn't just read the txt file it actually writes to it?

Meaning I could just specify --cookies cookies.txt and it would set the cookies to the file? If so the docs need updated. Or does cookies.txt need to be set once then yt-dl will set it fresh every subsequent run?

The only other hurdle is that even with the cookies set, you have to go to the site and get past the cloudfare protection, or it will still give a 404.

Does yt-dl have a parallel download option like my script?

@Iridium-Lo commented on GitHub (Feb 27, 2024): oops, obviously curl doesn't have that option, I'll edit that. So the option `--cookies` option doesn't just read the txt file it actually writes to it? Meaning I could just specify `--cookies cookies.txt` and it would set the cookies to the file? If so the docs need updated. Or does `cookies.txt` need to be set once then yt-dl will set it fresh every subsequent run? The only other hurdle is that even with the cookies set, you have to go to the site and get past the cloudfare protection, or it will still give a 404. Does yt-dl have a parallel download option like my script?
Author
Owner

@dirkf commented on GitHub (Feb 27, 2024):

cookies.txt starts as you wrote it and is updated by yt-dl as new cookies and values are set by the site.

yt-dl doesn't support the syntax/library modules needed for parallel execution but but there is some support for it in yt-dlp.

@dirkf commented on GitHub (Feb 27, 2024): cookies.txt starts as you wrote it and is updated by yt-dl as new cookies and values are set by the site. yt-dl doesn't support the syntax/library modules needed for parallel execution but but there is some support for it in yt-dlp.
Author
Owner

@Iridium-Lo commented on GitHub (Feb 28, 2024):

so you set cookies.txt once then don't need to keep setting it.

you say some support for parallel downloads?

Can I add a PR for my script?

if you read the original comment it's a better way of doing things (less work) than creating playlists, aside from the parallel part.

@Iridium-Lo commented on GitHub (Feb 28, 2024): so you set cookies.txt once then don't need to keep setting it. you say some support for parallel downloads? Can I add a PR for my script? if you read the original comment it's a better way of doing things (less work) than creating playlists, aside from the parallel part.
Author
Owner

@Iridium-Lo commented on GitHub (Feb 28, 2024):

or add a parallel download option?

@Iridium-Lo commented on GitHub (Feb 28, 2024): or add a parallel download option?
Author
Owner

@dirkf commented on GitHub (Feb 28, 2024):

Running multiple instances of yt-dl in parallel using the same output directory or download archive (etc) is not really supported. See #350 and the yt-dlp thread linked there.

@dirkf commented on GitHub (Feb 28, 2024): Running multiple instances of yt-dl in parallel using the same output directory or download archive (etc) is not really supported. See #350 and the yt-dlp thread linked there.
Author
Owner

@Iridium-Lo commented on GitHub (Feb 29, 2024):

Alright so essentially parallel downloads just isn't going to integrate well with the existing code?

That thread is a discussion between users mostly I don't see any dev input (might've missed it), anyway for me it's not an issue.

From what you have said it will be easy to carry out this feature request. Could I help?

Caveat

Even when you have the latest cookies you still have to visit the site and get through the cloudfare protection manually, or it will give a 404, so we (if you point me in the right difection) or yt-dl maintainers will need to look into that.

just a note I install yt-dl from the repo not using a package manager, so I can get access to branches with fixes before they are merged to master (it can take a while sometimes)

@Iridium-Lo commented on GitHub (Feb 29, 2024): Alright so essentially parallel downloads just isn't going to integrate well with the existing code? That thread is a discussion between users mostly I don't see any dev input (might've missed it), anyway for me it's not an issue. From what you have said it will be easy to carry out this feature request. Could I help? #### Caveat Even when you have the latest cookies you still have to visit the site and get through the cloudfare protection manually, or it will give a 404, so we (if you point me in the right difection) or yt-dl maintainers will need to look into that. just a note I install yt-dl from the repo not using a package manager, so I can get access to branches with fixes before they are merged to master (it can take a while sometimes)
Author
Owner

@dirkf commented on GitHub (Feb 29, 2024):

The caveat is the real problem that needs to be solved. yt-dlp hopes to do it with curlCFFI. An implementation of that solution here would require so much shimming and/or imposition of limiting dependencies that just using yt-dlp instead would be more sensible.

As you may observe, anyone can be a dev, but almost all the knowledgable and active contributors are working on yt-dlp. Features and relevant fixes here generally get pulled downstream, and downstream improvements, especially extractor modules, may also be pulled here. I'm not likely to merge a PR that adds a feature already implemented downstream unless it behaves in the same way (API, CLI) for cases that are covered by both implementations.

@dirkf commented on GitHub (Feb 29, 2024): The caveat is the real problem that needs to be solved. _yt-dlp_ hopes to do it with [curlCFFI](https://github.com/yt-dlp/yt-dlp/pull/7595). An implementation of that solution here would require so much shimming and/or imposition of limiting dependencies that just using _yt-dlp_ instead would be more sensible. As you may observe, anyone can be a dev, but almost all the knowledgable and active contributors are working on _yt-dlp_. Features and relevant fixes here generally get pulled downstream, and downstream improvements, especially extractor modules, may also be pulled here. I'm not likely to merge a PR that adds a feature already implemented downstream unless it behaves in the same way (API, CLI) for cases that are covered by both implementations.
Author
Owner

@Iridium-Lo commented on GitHub (Feb 29, 2024):

Oh right I see... That's why I thought yt-dlp is better, I had things the wrong way around I thought yt-dl is the one with more support.

Although you said yt-dlp has some parallel support. GNU parallel is also some as it only goes to 60 instances max.

Could you let me know the next best thing (the cmd) for generating an archive from urls with yt-dlp (with minimal download) like you did for yt-dl please?

@Iridium-Lo commented on GitHub (Feb 29, 2024): Oh right I see... That's why I thought yt-dlp is better, I had things the wrong way around I thought yt-dl is the one with more support. Although you said yt-dlp has some parallel support. GNU parallel is also some as it only goes to 60 instances max. Could you let me know the next best thing (the cmd) for generating an archive from urls with yt-dlp (with minimal download) like you did for yt-dl please?
Author
Owner

@Iridium-Lo commented on GitHub (Mar 2, 2024):

ignore that the archive generate cmd is the same, btw yt-dlp is much faster

@Iridium-Lo commented on GitHub (Mar 2, 2024): ignore that the archive generate cmd is the same, btw yt-dlp is much faster
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/youtube-dl-ytdl-org#26817
No description provided.