ability to skip n-sig calculation #26818

Open
opened 2026-02-21 13:22:03 -05:00 by deekerman · 9 comments
Owner

Originally created by @wellsyw on GitHub (Jan 9, 2024).

Checklist

  • I'm reporting a site feature request
  • I've verified that I'm running youtube-dl version 2021.12.17
  • I've searched the bugtracker for similar site feature requests including closed ones

Description

The n-sig calculation takes roughly 4-5 seconds each for me (twice for each video), so solving it makes youtube-dl spend 10 seconds of cpu time for each url. It would be nice to have an option to skip the calculation if it is not strictly necessary, or even better, automatic detection of its necessity. For instance, I'm fairly sure the --get-title, --get-duration, -F (possibly) options do not require solving the signature.

I have below a simple helper script that displays the channel ids for youtube urls, and it just spent 42 seconds to get the information for five videos. This is, of course, unacceptable performance.

#!/bin/sh
FORMAT='%(id)s %(channel_id)s %(uploader)s'

exec python youtube_dl/__main__.py \
	-o "$FORMAT" --get-filename \
	"$@"

Of course, making the calculation about 100 times faster would be preferred, but until that happens, the ability to skip it on demand would be an acceptable substitute.

Originally created by @wellsyw on GitHub (Jan 9, 2024). <!-- ###################################################################### WARNING! IGNORING THE FOLLOWING TEMPLATE WILL RESULT IN ISSUE CLOSED AS INCOMPLETE ###################################################################### --> ## Checklist <!-- Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.12.17. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Finally, put x into all relevant boxes (like this [x]) --> - [x] I'm reporting a site feature request - [x] I've verified that I'm running youtube-dl version **2021.12.17** - [x] I've searched the bugtracker for similar site feature requests including closed ones ## Description <!-- Provide an explanation of your site feature request in an arbitrary form. Please make sure the description is worded well enough to be understood, see https://github.com/ytdl-org/youtube-dl#is-the-description-of-the-issue-itself-sufficient. Provide any additional information, suggested solution and as much context and examples as possible. --> The n-sig calculation takes roughly 4-5 seconds each for me (twice for each video), so solving it makes youtube-dl spend 10 seconds of cpu time for each url. It would be nice to have an option to skip the calculation if it is not strictly necessary, or even better, automatic detection of its necessity. For instance, I'm fairly sure the --get-title, --get-duration, -F (possibly) options do not require solving the signature. I have below a simple helper script that displays the channel ids for youtube urls, and it just spent 42 seconds to get the information for five videos. This is, of course, unacceptable performance. ``` #!/bin/sh FORMAT='%(id)s %(channel_id)s %(uploader)s' exec python youtube_dl/__main__.py \ -o "$FORMAT" --get-filename \ "$@" ``` Of course, making the calculation about 100 times faster would be preferred, but until that happens, the ability to skip it on demand would be an acceptable substitute.
Author
Owner

@dirkf commented on GitHub (Jan 10, 2024):

The problem is that the extractor code doesn't know whether it needs to calculate valid format URLs or not, since this is only known in the calling code (YoutubeDL.extract_info() in youtube_dl/YoutubeDL.py).

To achieve what you suggest with the current YT extraction scheme, we'd have to do one of these:

  1. define a method of YoutubeDL that the extractor code could call to determine whether valid format links are needed, or
  2. define a way of returning the format links as a continuation that is either evaluated by YoutubeDL.process_video_result() (say), or discarded, depending on whether valid format links are needed.

Or the extraction scheme could be modified so that a different YT client could be used, as in yt-dlp and as already used for age-gated videos. But that client gets fewer, if unthrottled, formats.

The n-sig processing is meant to cache results: when the same n-sig value is seen in a whole lot of formats for one video it should only be computed once.

I tried this command:

$ time python -m youtube_dl -o '%(id)s %(channel_id)s %(uploader)s' test:YouTube --get-filename

On this machine, low-spec by today's standards, distro Py3.11 seems to be almost 3x as fast as the miniconda Py2.7 (on another machine, distro Py2.7 is much closer to PPA Py3.9). Disabling n-sig processing cuts execution time from 9s to 3s with Py3 and 26s to 5s with Py2.

If someone were to try profiling the current code (we did that when the n-sig processing was first implemented), it might indicate some unsuspected hog.

One known execution time driver is that, whenever YT changes its player JS, we have to fetch that, a 2MB download, in addition to the bloated page and/or API JSON. This is still going to be a small part of the total run time with a typical modern internet connection.

@dirkf commented on GitHub (Jan 10, 2024): The problem is that the extractor code doesn't know whether it needs to calculate valid format URLs or not, since this is only known in the calling code (`YoutubeDL.extract_info()` in `youtube_dl/YoutubeDL.py`). To achieve what you suggest with the current YT extraction scheme, we'd have to do one of these: 1. define a method of `YoutubeDL` that the extractor code could call to determine whether valid format links are needed, or 2. define a way of returning the format links as a continuation that is either evaluated by `YoutubeDL.process_video_result()` (say), or discarded, depending on whether valid format links are needed. Or the extraction scheme could be modified so that a different YT client could be used, as in _yt-dlp_ and as already used for age-gated videos. But that client gets fewer, if unthrottled, formats. The n-sig processing is meant to cache results: when the same n-sig value is seen in a whole lot of formats for one video it should only be computed once. I tried this command: ```console $ time python -m youtube_dl -o '%(id)s %(channel_id)s %(uploader)s' test:YouTube --get-filename ``` On this machine, low-spec by today's standards, distro Py3.11 seems to be almost 3x as fast as the _miniconda_ Py2.7 (on another machine, distro Py2.7 is much closer to PPA Py3.9). Disabling n-sig processing cuts execution time from 9s to 3s with Py3 and 26s to 5s with Py2. If someone were to try profiling the current code (we did that when the n-sig processing was first implemented), it might indicate some unsuspected hog. One known execution time driver is that, whenever YT changes its player JS, we have to fetch that, a 2MB download, in addition to the bloated page and/or API JSON. This is still going to be a small part of the total run time with a typical modern internet connection.
Author
Owner

@wellsyw commented on GitHub (Jan 11, 2024):

If I understood your response correctly, you say that:

  1. the code is decoupled enough that passing a command-line argument to the relevant code would be somewhat difficult to implement
  2. upgrading python version may or may not give a performance boost.

A third, hackish approach would be trivial to implement: an environment variable could easily be passed to the code, and achieve the same effect, I guess.

For what it's worth, I have an Athlon II ("Rana" core) so it is not the newest thing around. Using youtube with the new 'polymer' interface is often unbearably slow, so I mostly use youtube-dl to make up for it.

Anyway, this is digressing but the normal runtime for youtube-dl with --simulate for me is about 13 seconds (so n-sig calculation takes ~70% of runtime), but I also found some videos where the runtime is much longer, 25-30 seconds or more per video and it is not caused by the n-sig processing, but rather something that happens between the two instances of n-sig solving, judging from a debug print or two. But I'll just file a new bug for that.

@wellsyw commented on GitHub (Jan 11, 2024): If I understood your response correctly, you say that: 1) the code is decoupled enough that passing a command-line argument to the relevant code would be somewhat difficult to implement 2) upgrading python version may or may not give a performance boost. A third, hackish approach would be trivial to implement: an environment variable could easily be passed to the code, and achieve the same effect, I guess. For what it's worth, I have an Athlon II ("Rana" core) so it is not the newest thing around. Using youtube with the new 'polymer' interface is often unbearably slow, so I mostly use youtube-dl to make up for it. Anyway, this is digressing but the normal runtime for youtube-dl with --simulate for me is about 13 seconds (so n-sig calculation takes ~70% of runtime), but I also found some videos where the runtime is much longer, 25-30 seconds or more per video and it is _not_ caused by the n-sig processing, but rather something that happens between the two instances of n-sig solving, judging from a debug print or two. But I'll just file a new bug for that.
Author
Owner

@dirkf commented on GitHub (Jan 16, 2024):

Please try #32695, or a new nightly build that incorporates it after I merge it.

@dirkf commented on GitHub (Jan 16, 2024): Please try #32695, or a new nightly build that incorporates it after I merge it.
Author
Owner

@wellsyw commented on GitHub (Jan 16, 2024):

Well, I'll be. Three seconds or a bit under.

But, er, all downloads seem to be throttled now?

I spotted a &n=%3Cfunction+inner+at+0x808a01d70%3E& in the url.

@wellsyw commented on GitHub (Jan 16, 2024): Well, I'll be. Three seconds or a bit under. But, er, all downloads seem to be throttled now? I spotted a `&n=%3Cfunction+inner+at+0x808a01d70%3E&` in the url.
Author
Owner

@wellsyw commented on GitHub (Aug 19, 2024):

FWIW, I switched to Python 3 and the total runtime (for one video) went down by a second and a half. That's rather significant, I guess. 30 seconds for five videos.

@wellsyw commented on GitHub (Aug 19, 2024): FWIW, I switched to Python 3 and the total runtime (for one video) went down by a second and a half. That's rather significant, I guess. 30 seconds for five videos.
Author
Owner

@dirkf commented on GitHub (Aug 19, 2024):

Although it doesn't affect your use case of collecting IDs, un-descrambled n-sig now gives 403 instead of just throttling.

@dirkf commented on GitHub (Aug 19, 2024): Although it doesn't affect your use case of collecting IDs, un-descrambled n-sig now gives 403 instead of just throttling.
Author
Owner

@wellsyw commented on GitHub (Apr 29, 2025):

Starting a week or two ago, a single run now takes 17 seconds, or 50-60% longer. I guess they are serving heavier javascript now... or my computer has mysteriously slowed down.

@wellsyw commented on GitHub (Apr 29, 2025): Starting a week or two ago, a single run now takes 17 seconds, or 50-60% longer. I guess they are serving heavier javascript now... or my computer has mysteriously slowed down.
Author
Owner

@dirkf commented on GitHub (Apr 29, 2025):

Maybe the result of https://github.com/ytdl-org/youtube-dl/issues/32905#issuecomment-2408915296 will be a bit faster when I get back to it. There are probably efficiencies to be made with the current code but I don't want to duplicate effort.

FYI the latest YT player code has a gigantic declaration line that we have to handle in order to run the challenge code. Maybe this is what's slower, or it could be the code search (some, discarded, search patterns would hang under test), or something new in the JSInterpreter.

@dirkf commented on GitHub (Apr 29, 2025): Maybe the result of https://github.com/ytdl-org/youtube-dl/issues/32905#issuecomment-2408915296 will be a bit faster when I get back to it. There are probably efficiencies to be made with the current code but I don't want to duplicate effort. FYI the latest YT player code has a gigantic declaration line that we have to handle in order to run the challenge code. Maybe this is what's slower, or it could be the code search (some, discarded, search patterns would hang under test), or something new in the JSInterpreter.
Author
Owner

@dirkf commented on GitHub (May 2, 2025):

Testing the patch for #33123 on a basic 32-bit Linux machine, with test/test_youtube_signature.py, which exercises the signature and n-sig code quite a lot, Py2.7 (~830s) is a bit more than twice as slow as Py3.11 (~380s). But 3.11 is itself somewhat faster than previous Py3 versions.

@dirkf commented on GitHub (May 2, 2025): Testing the patch for #33123 on a basic 32-bit Linux machine, with `test/test_youtube_signature.py`, which exercises the signature and n-sig code quite a lot, Py2.7 (~830s) is a bit more than twice as slow as Py3.11 (~380s). But 3.11 is itself somewhat faster than previous Py3 versions.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/youtube-dl#26818
No description provided.