[Feature Request] parallel downloads feature using a batch file #3003

Author

Owner

@optikfluffel commented on GitHub (Dec 19, 2014):

This would be nice for playlists for example at Youtube, too :)

@optikfluffel commented on GitHub (Dec 19, 2014): This would be nice for playlists for example at Youtube, too :)

deekerman commented

Author

Owner

@ffigiel commented on GitHub (Feb 21, 2016):

👍 It's doable with external downloaders, but that's an annoying hurdle

@ffigiel commented on GitHub (Feb 21, 2016): :+1: It's doable with external downloaders, but that's an annoying hurdle

deekerman commented

Author

Owner

@schmod commented on GitHub (Apr 21, 2017):

Is there any reason why xargs is not be sufficient for this?

@schmod commented on GitHub (Apr 21, 2017): Is there any reason why `xargs` is not be sufficient for this?

deekerman commented

Author

Owner

@yan12125 commented on GitHub (Apr 22, 2017):

Is there any reason why xargs is not be sufficient for this?

Some cases: custom output templates, postprocessing...

@yan12125 commented on GitHub (Apr 22, 2017): > Is there any reason why ```xargs``` is not be sufficient for this? Some cases: custom output templates, postprocessing...

deekerman commented

Author

Owner

@Roman2K commented on GitHub (Nov 8, 2017):

I would rather have the complexity of parallel downloads kept out of youtube-dl so that its code base remains more maintainable.

Parallelism can be already be delegated to xargs -P x -n y where x is the number of parallel downloads and y is the number of URLs/IDs to pass to youtube-dl, most likely 1. I find this a very elegant solution, composing various tools, each one doing a single job well. Used it successfully many times, saving me hours or days.

@yan12125

Some cases: custom output templates, postprocessing...

Maybe I'm missing something but why can't this be done with xargs youtube-dl passing the same arguments (for templates, postprocessing, ...) as you would youtube-dl?

@Roman2K commented on GitHub (Nov 8, 2017): I would rather have the complexity of parallel downloads kept out of youtube-dl so that its code base remains more maintainable. Parallelism can be already be delegated to `xargs -P x -n y` where `x` is the number of parallel downloads and `y` is the number of URLs/IDs to pass to youtube-dl, most likely `1`. I find this a very elegant solution, composing various tools, each one doing a single job well. Used it successfully many times, saving me hours or days. @yan12125 > Some cases: custom output templates, postprocessing... Maybe I'm missing something but why can't this be done with `xargs youtube-dl` passing the same arguments (for templates, postprocessing, ...) as you would `youtube-dl`?

deekerman commented

Author

Owner

@yan12125 commented on GitHub (Nov 8, 2017):

From the viewpoint of youtube-dl internals, a series of urls is not different than a playlist with several videos. I was considering the latter case when I wrote the previous comment. If you want to achieve downloading videos of a playlist in parallel along with postprocessing them, you'll need to run the parallel download step - either via internal codes or external tools - from youtube-dl, instead of run youtube-dl in parallel, and then the postprocessing step can continue inside youtube-dl.

Or moving postprocessing codes to standalone commands. That would be interesting.

@yan12125 commented on GitHub (Nov 8, 2017): From the viewpoint of youtube-dl internals, a series of urls is not different than a playlist with several videos. I was considering the latter case when I wrote the previous comment. If you want to achieve downloading videos of a playlist in parallel along with postprocessing them, you'll need to run the parallel download step - either via internal codes or external tools - from youtube-dl, instead of run youtube-dl in parallel, and then the postprocessing step can continue inside youtube-dl. Or moving postprocessing codes to standalone commands. That would be interesting.

deekerman commented

Author

Owner

@siddht4 commented on GitHub (Nov 28, 2017):

After bit of research and all,I found out that https://github.com/MrS0m30n3/youtube-dl-gui already does supports it in series form,tinkering of it can support the parallel form

https://pypi.python.org/pypi/twodict may help.

https://github.com/MrS0m30n3/youtube-dl-gui/blob/master/youtube_dl_gui/downloadmanager.py#L106-L122 manages the current state.

https://github.com/MrS0m30n3/youtube-dl-gui/blob/master/youtube_dl_gui/downloadmanager.py#L239-L385 manages the queue which can be modified to work in parallel.

So it seems the process already exist so a little tinkering will add to the main project.

@siddht4 commented on GitHub (Nov 28, 2017): After bit of research and all,I found out that https://github.com/MrS0m30n3/youtube-dl-gui already does supports it in series form,tinkering of it can support the parallel form https://pypi.python.org/pypi/twodict may help. https://github.com/MrS0m30n3/youtube-dl-gui/blob/master/youtube_dl_gui/downloadmanager.py#L106-L122 manages the current state. https://github.com/MrS0m30n3/youtube-dl-gui/blob/master/youtube_dl_gui/downloadmanager.py#L239-L385 manages the queue which can be modified to work in parallel. So it seems the process already exist so a little tinkering will add to the main project.

deekerman commented

Author

Owner

@epitron commented on GitHub (Dec 11, 2018):

Is there any reason why xargs is not be sufficient for this?

If you use xargs, it's nontrivial to include the numerical playlist index in the filename (for playlists where order matters).

@epitron commented on GitHub (Dec 11, 2018): > Is there any reason why xargs is not be sufficient for this? If you use xargs, it's nontrivial to include the numerical playlist index in the filename (for playlists where order matters).

deekerman commented

Author

Owner

@ghost commented on GitHub (Dec 12, 2018):

@siddht4 get the download data using youtube-dl (--skip-download --dump-json), write ^#1 an aria2 compatible-list, and rely on aria2 for both the queue-management and download ^#2, to get both multipart-download and multi-file download.

_{For example, aria2 with --split=3 --min-split-size=1M --max-concurrent-downloads=5 --max-connection-per-server=16 will give you an active download-queue of five files, each has three parallel segment-download at a time. works well on youtube. With --continue=true --allow-overwrite=false --auto-file-renaming=false you can re-run the aria2 command in-case of an error, skipping completed files.}

@ghost commented on GitHub (Dec 12, 2018): @siddht4 get the download data using youtube-dl (<code>--skip-download --dump-json</code>), write <a href="https://github.com/eladkarako/youtube-dl-p/tree/master/ytdl2aria">#1</a> an <a href="https://aria2.github.io/manual/en/html/aria2c.html#cmdoption-i">aria2 compatible-list</a>, and rely on aria2 for both the queue-management and download <a href="https://github.com/eladkarako/youtube-dl-p/">#2</a>, to get both multipart-download and multi-file download. For example, aria2 with <code>--split=3 --min-split-size=1M --max-concurrent-downloads=5 --max-connection-per-server=16</code> will give you an active download-queue of five files, each has three parallel segment-download at a time. works well on youtube. With <code>--continue=true --allow-overwrite=false --auto-file-renaming=false</code> you can re-run the aria2 command in-case of an error, skipping completed files.

deekerman commented

Author

Owner

@alex-hofmann commented on GitHub (Feb 20, 2019):

Just wanted to thank @Roman2K and provide the full one-liner, since many youtube-dl users may not be familiar with xargs or the shell.

Parallelism can be already be delegated to xargs -P x -n y where x is the number of parallel downloads and y is the number of URLs/IDs to pass to youtube-dl, most likely 1. I find this a very elegant solution, composing various tools, each one doing a single job well. Used it successfully many times, saving me hours or days.

cat files.txt | xargs -n 1 -P 4 youtube-dl
Here, files.txt is my batch file; -n 1 indicates one url for each youtube-dl call; -P 4 indicates 4 parallel youtube-dl calls

@alex-hofmann commented on GitHub (Feb 20, 2019): Just wanted to thank @Roman2K and provide the full one-liner, since many youtube-dl users may not be familiar with xargs or the shell. > Parallelism can be already be delegated to `xargs -P x -n y` where `x` is the number of parallel downloads and `y` is the number of URLs/IDs to pass to youtube-dl, most likely `1`. I find this a very elegant solution, composing various tools, each one doing a single job well. Used it successfully many times, saving me hours or days. `cat files.txt | xargs -n 1 -P 4 youtube-dl` Here, `files.txt` is my batch file; -n 1 indicates one url for each youtube-dl call; -P 4 indicates 4 parallel youtube-dl calls

deekerman commented

Author

Owner

@aryehbeitz commented on GitHub (Feb 22, 2019):

@alex-hofmann can this be used to download in parallel when downloading a channel or playlist on youtube?

@aryehbeitz commented on GitHub (Feb 22, 2019): @alex-hofmann can this be used to download in parallel when downloading a channel or playlist on youtube?

deekerman commented

Author

Owner

@PiotrDabrowskey commented on GitHub (Feb 22, 2019):

@aryehbeitz No, I guess. It only allows you to download multiple links in parallel, but you need to specify those links in the file. Playlists and channels are just single link which are passed to youtube-dl as you normally do, so it does not solve the problem of this issue.

If there's a way to gather all the links from one playlist and place them into the file, this would do the trick.

@PiotrDabrowskey commented on GitHub (Feb 22, 2019): @aryehbeitz No, I guess. It only allows you to download multiple links in parallel, but you need to specify those links in the file. Playlists and channels are just single link which are passed to youtube-dl as you normally do, so it does not solve the problem of this issue. If there's a way to gather all the links from one playlist and place them into the file, this would do the trick.

deekerman commented

Author

Owner

@alex-hofmann commented on GitHub (Feb 22, 2019):

@aryehbeitz @PiotrDabrowskey Yeah I think you're right, this is only useful when you can feed the known links. For example, if someone wanted to download the Jeopardy! episodes that a reddit user graciously uploads daily they could use something like

curl -A 'random' https://www.reddit.com/user/jibjabjrjr/.rss 2>/dev/null | grep -Po 'https:\/\/drive.*?(?=\;|\&)' | xargs -n 1 -P 4 youtube-dl

My guess is that something similar is available for pulling the urls from a youtube playlist with a little curl+grep

@alex-hofmann commented on GitHub (Feb 22, 2019): @aryehbeitz @PiotrDabrowskey Yeah I think you're right, this is only useful when you can feed the known links. For example, if someone wanted to download the Jeopardy! episodes that a reddit user graciously uploads daily they could use something like ```shell curl -A 'random' https://www.reddit.com/user/jibjabjrjr/.rss 2>/dev/null | grep -Po 'https:\/\/drive.*?(?=\;|\&)' | xargs -n 1 -P 4 youtube-dl ``` My guess is that something similar is available for pulling the urls from a youtube playlist with a little curl+grep

deekerman commented

Author

Owner

@Roman2K commented on GitHub (Feb 22, 2019):

@aryehbeitz You can get the individual URLs from a playlist with:

$ youtube-dl -j --flat-playlist 'https://www.youtube.com/watch?v=CcNo07Xp8aQ&list=PLA671B7E7BFB780B7' | jq -r '"https://youtu.be/"+ .url'

That can be fed directly to xargs for parallel downloads as in @alex-hofmann's example.

@Roman2K commented on GitHub (Feb 22, 2019): @aryehbeitz You can get the individual URLs from a playlist with: ```sh $ youtube-dl -j --flat-playlist 'https://www.youtube.com/watch?v=CcNo07Xp8aQ&list=PLA671B7E7BFB780B7' | jq -r '"https://youtu.be/"+ .url' ``` That can be fed directly to `xargs` for parallel downloads as in @alex-hofmann's example.

deekerman commented

https://github.com/Roman2K/ytdump

Author

Owner

@Roman2K commented on GitHub (Mar 6, 2019):

If anyone's interested, I made a cronjob for backing up my YouTube playlists to Google Drive (using rclone). Idempotent/incremental. Kind of rough scripting but I've been running it with several 500+ long playlists over the last few days and it works well.

Basic usage: ruby dl.rb PLAYLIST_URL
- downloads to ./out
- keeps metadata/cache in ./meta
Export to Drive/other: see dl_to_rclone
- usage: see favorites/dl.sh
Cronjob: see crontab and cronjob
- depends on alerterr
Concurrency: see Downloader::NTHREADS in dl.rb

Clone and customize to your needs.

@Roman2K commented on GitHub (Mar 6, 2019): If anyone's interested, I made a cronjob for backing up my YouTube playlists to Google Drive (using rclone). Idempotent/incremental. Kind of rough scripting but I've been running it with several 500+ long playlists over the last few days and it works well. https://github.com/Roman2K/ytdump * Basic usage: `ruby dl.rb PLAYLIST_URL` * downloads to `./out` * keeps metadata/cache in `./meta` * Export to Drive/other: see `dl_to_rclone` * usage: see `favorites/dl.sh` * Cronjob: see `crontab` and `cronjob` * depends on [alerterr](https://github.com/Roman2K/alerterr) * Concurrency: see `Downloader::NTHREADS` in dl.rb Clone and customize to your needs.

deekerman commented

Author

Owner

@zer0def commented on GitHub (Apr 3, 2019):

Based on @alex-hofmann's and @Roman2K's hints, here's a trashman's oneliner that should work with any extractor (requires jq alongside typical coreutils binaries):

TMP=`mktemp`; TMPI=0; while ((TMPI++)); read -r; do TMPF="${TMP}.`printf "%05d" "${TMPI}"`"; cat <<<${REPLY} >${TMPF}; echo ${TMPF}; done <<<`youtube-dl -J -u <username> -p <password> <playlist_url> | jq -Mc 'if ._type=="playlist" then .entries[] else . end'` | xargs -I'{}' -n1 -P`nproc` -- /bin/sh -c 'youtube-dl -f best -o "`echo {} | awk -F. '\''{print $NF}'\''`-%(title)s-%(ext)s" --load-info {} &>/dev/null; rm {}'; unset TMP TMPI TMPF

Adjust to your heart's content.

@zer0def commented on GitHub (Apr 3, 2019): Based on @alex-hofmann's and @Roman2K's hints, here's a trashman's oneliner that should work with any extractor (requires `jq` alongside typical `coreutils` binaries): ``` TMP=`mktemp`; TMPI=0; while ((TMPI++)); read -r; do TMPF="${TMP}.`printf "%05d" "${TMPI}"`"; cat <<<${REPLY} >${TMPF}; echo ${TMPF}; done <<<`youtube-dl -J -u <username> -p <password> <playlist_url> | jq -Mc 'if ._type=="playlist" then .entries[] else . end'` | xargs -I'{}' -n1 -P`nproc` -- /bin/sh -c 'youtube-dl -f best -o "`echo {} | awk -F. '\''{print $NF}'\''`-%(title)s-%(ext)s" --load-info {} &>/dev/null; rm {}'; unset TMP TMPI TMPF ``` Adjust to your heart's content.

deekerman commented

Author

Owner

@epitron commented on GitHub (Apr 3, 2019):

@zer0def Nice!

Small issue: the number of parallel downloaders is set to the number of CPU cores (-P`nproc`). That parameter should be tuned to match the network capacity, not the CPU capacity.

Anyone who wants to try that should start it around -P5, and increase it until it saturates their connection.

@epitron commented on GitHub (Apr 3, 2019): @zer0def Nice! Small issue: the number of parallel downloaders is set to the number of CPU cores (-P\`nproc\`). That parameter should be tuned to match the network capacity, not the CPU capacity. Anyone who wants to try that should start it around -P5, and increase it until it saturates their connection.

deekerman commented

Author

Owner

@zer0def commented on GitHub (Apr 4, 2019):

@epitron As long as your extractions don't fall back to ffmpeg, it's safe to ramp up the process number, but when they do, you might end up unnecessarily extending your downloads due to threads fighting for CPU time, so I've picked nproc simply because it's fairly idiot-proof, while remaining faster than a sequential run on a large playlist.

As per my previous comment: adjust to your heart's content, since everyone's case might be different.

@zer0def commented on GitHub (Apr 4, 2019): @epitron As long as your extractions don't fall back to ffmpeg, it's safe to ramp up the process number, but when they do, you might end up unnecessarily extending your downloads due to threads fighting for CPU time, so I've picked `nproc` simply because it's fairly idiot-proof, while remaining faster than a sequential run on a large playlist. As per my previous comment: adjust to your heart's content, since everyone's case might be different.

deekerman commented

Author

Owner

@epitron commented on GitHub (Apr 4, 2019):

@zer0def Hyperthreading can also artificially increase your CPU count. What if you have 32 cores, and a regular network connection?

I thought YTDL only used ffmpeg to copy streams into different containers (unless you pass --recode-video or a specific format that needs to be transcoded). Are there any services that will by-default transcode video, and which support playlists?

@epitron commented on GitHub (Apr 4, 2019): @zer0def Hyperthreading can also artificially increase your CPU count. What if you have 32 cores, and a regular network connection? I thought YTDL only used ffmpeg to copy streams into different containers (unless you pass `--recode-video` or a specific format that needs to be transcoded). Are there any services that will by-default transcode video, and which support playlists?

deekerman commented

Author

Owner

@zer0def commented on GitHub (Apr 5, 2019):

Then you have threads stalling for data, which is hardly a concern compared to saturating out low-end and/or low-core count CPUs with a pipe huge enough to overwhelm it.

@zer0def commented on GitHub (Apr 5, 2019): Then you have threads stalling for data, which is hardly a concern compared to saturating out low-end and/or low-core count CPUs with a pipe huge enough to overwhelm it.

deekerman commented

Author

Owner

@epitron commented on GitHub (Apr 8, 2019):

@zer0def What if YouTube bans or throttles you for spamming their servers with connections?

@epitron commented on GitHub (Apr 8, 2019): @zer0def What if YouTube bans or throttles you for spamming their servers with connections?

deekerman commented

Author

Owner

@zer0def commented on GitHub (Apr 9, 2019):

Do you have an actual case to provide as basis or are we just talking hypotheses? Because if you're looking for a golden mean, then yes, you could just cobble up a proper solution, write a wrapper script or use parallel --limit. All of which go beyond a minimal dependencies "trashman's" case (ideally, I would've liked to omit jq, but haven't yet figured out a way to output necessary information in one line without running into issues with whitespace delimited paths).

@zer0def commented on GitHub (Apr 9, 2019): Do you have an actual case to provide as basis or are we just talking hypotheses? Because if you're looking for a golden mean, then yes, you could just cobble up a proper solution, write a wrapper script or use `parallel --limit`. All of which go beyond a minimal dependencies "trashman's" case (ideally, I would've liked to omit `jq`, but haven't yet figured out a way to output necessary information in one line without running into issues with whitespace delimited paths).

deekerman commented