Custom filename transformations (was: Filename: I want to restrict characters, but permit spaces) #4099

Open
opened 2026-02-21 01:50:05 -05:00 by deekerman · 14 comments
Owner

Originally created by @keybounce on GitHub (Feb 23, 2015).

Right now, restricting filenames to reasonable characters also prevents spaces.

Yea, I know, most unix scripts/etc. can't handle spaces, but most modern GUI tools can.

Originally created by @keybounce on GitHub (Feb 23, 2015). Right now, restricting filenames to reasonable characters also prevents spaces. Yea, I know, most unix scripts/etc. can't handle spaces, but most modern GUI tools can.
Author
Owner

@phihag commented on GitHub (Feb 23, 2015):

If you are using a modern GUI tool, why are you restricting characters in the first place?

@phihag commented on GitHub (Feb 23, 2015): If you are using a modern GUI tool, why are you restricting characters in the first place?
Author
Owner

@keybounce commented on GitHub (Feb 23, 2015):

Because I want readable file names.

The code of modern GUI programs may be 8-bit smart, but I want to be able to read filenames.

And, as long as the only issue is spaces, most of what I want to do from the command line works -- bash escapes spaces. Just as long as it's either a real program, or a well-written script (and I've gotten quite good with the whole "$var" syntax all over everything ... yuck, why wasn't this in the shell design from day one?)

@keybounce commented on GitHub (Feb 23, 2015): Because I want readable file names. The code of modern GUI programs may be 8-bit smart, but I want to be able to read filenames. And, as long as the only issue is spaces, most of what I want to do from the command line works -- bash escapes spaces. Just as long as it's either a real program, or a well-written script (and I've gotten quite good with the whole "$var" syntax all over everything ... yuck, why wasn't this in the shell design from day one?)
Author
Owner

@phihag commented on GitHub (Feb 24, 2015):

Can you elaborate on why filenames would get more readable when you pass in --restrict-filenames? I'd very much prefer the first filename:

$ youtube-dl --get-filename F59zpvPg3i0
新年快樂 - 小虎隊、憂歡派對 (1989)-F59zpvPg3i0.mp4
$ youtube-dl --get-filename F59zpvPg3i0 --restrict-filenames
1989-F59zpvPg3i0.mp4

As it stands, this issue is lacking context, and without context, we usually close issues. So please do provide a more detailed example of why you'd wanna restrict filenames in the first place.

@phihag commented on GitHub (Feb 24, 2015): Can you elaborate on why filenames would get more readable when you pass in `--restrict-filenames`? I'd very much prefer the first filename: ``` $ youtube-dl --get-filename F59zpvPg3i0 新年快樂 - 小虎隊、憂歡派對 (1989)-F59zpvPg3i0.mp4 $ youtube-dl --get-filename F59zpvPg3i0 --restrict-filenames 1989-F59zpvPg3i0.mp4 ``` As it stands, this issue is lacking [context](https://github.com/rg3/youtube-dl/blob/master/README.md#is-there-enough-context-in-your-bug-report), and without context, we usually close issues. So please do provide a more detailed example of why you'd wanna restrict filenames in the first place.
Author
Owner

@keybounce commented on GitHub (Mar 14, 2015):

Here is a directory listing

keybounceMBP:Etho michael$ ls
total 5993416
257872 Etho Plays Minecraft - Episode 390 Connected Houses.mp4
224464 Etho Plays Minecraft - Episode 391 River Terraforming.mp4
267000 Etho Plays Minecraft - Episode 392 Book Matrix.mp4
242904 Etho Plays Minecraft - Episode 394 Flying Sheep Farm.mp4
257096 Etho Plays Minecraft - Episode 395 Weird Style.mp4
239696 Etho Plays Minecraft - Episode 396 Hyper Speed Piggy.mp4
261672 Etho Plays Minecraft - Episode 397 Life Changer.mp4
502456 Etho's Modded Minecraft #13 - Bandit Camp.mp4
636848 Etho's Modded Minecraft #14 - Template vs. Blueprint.mp4
418792 Etho's Modded Minecraft #15 - Strange Voices In My Head.mp4
213968 Etho's Modded Minecraft 10 Death Mountain.mp4
252088 Etho's Modded Minecraft 11 Coke Oven Factory.mp4
183384 Etho's Modded Minecraft 12 Digital Miner.mp4
278968 Etho's Modded Minecraft 2 Tropical Fishing Huts.mp4
236544 Etho's Modded Minecraft 3 Favorite Tool.mp4
223208 Etho's Modded Minecraft 4 Smart NPCs.mp4
252792 Etho's Modded Minecraft 5 Mining Ship.mp4
276272 Etho's Modded Minecraft 6 Drilling Machine.mp4
235400 Etho's Modded Minecraft 7 Piston Power.mp4
287200 Etho's Modded Minecraft 8 Messy Closet.mp4
244792 Etho's Modded Minecraft 9 Steampunk City.mp4

Modded minecraft episodes 13-15 were downloaded with youtube-dl, as 480p (thank you for decoding the dash data and fetching it); the rest were from a firefox extension ("Download YouTube Videos as MP4") that fetches the 360p feed. The name change is significant, and breaks programs like mplayer that auto-play the next one, or even the sorting of files in Finder or the command line.

@keybounce commented on GitHub (Mar 14, 2015): Here is a directory listing keybounceMBP:Etho michael$ ls total 5993416 257872 Etho Plays Minecraft - Episode 390 Connected Houses.mp4 224464 Etho Plays Minecraft - Episode 391 River Terraforming.mp4 267000 Etho Plays Minecraft - Episode 392 Book Matrix.mp4 242904 Etho Plays Minecraft - Episode 394 Flying Sheep Farm.mp4 257096 Etho Plays Minecraft - Episode 395 Weird Style.mp4 239696 Etho Plays Minecraft - Episode 396 Hyper Speed Piggy.mp4 261672 Etho Plays Minecraft - Episode 397 Life Changer.mp4 502456 Etho's Modded Minecraft #13 - Bandit Camp.mp4 636848 Etho's Modded Minecraft #14 - Template vs. Blueprint.mp4 418792 Etho's Modded Minecraft #15 - Strange Voices In My Head.mp4 213968 Etho's Modded Minecraft 10 Death Mountain.mp4 252088 Etho's Modded Minecraft 11 Coke Oven Factory.mp4 183384 Etho's Modded Minecraft 12 Digital Miner.mp4 278968 Etho's Modded Minecraft 2 Tropical Fishing Huts.mp4 236544 Etho's Modded Minecraft 3 Favorite Tool.mp4 223208 Etho's Modded Minecraft 4 Smart NPCs.mp4 252792 Etho's Modded Minecraft 5 Mining Ship.mp4 276272 Etho's Modded Minecraft 6 Drilling Machine.mp4 235400 Etho's Modded Minecraft 7 Piston Power.mp4 287200 Etho's Modded Minecraft 8 Messy Closet.mp4 244792 Etho's Modded Minecraft 9 Steampunk City.mp4 Modded minecraft episodes 13-15 were downloaded with youtube-dl, as 480p (thank you for decoding the dash data and fetching it); the rest were from a firefox extension ("Download YouTube Videos as MP4") that fetches the 360p feed. The name change is significant, and breaks programs like mplayer that auto-play the next one, or even the sorting of files in Finder or the command line.
Author
Owner

@yan12125 commented on GitHub (Oct 12, 2017):

Other similar ideas:

  1. Keep ampersands and parentheses (@active8, #4549)
  2. Strip emojis (@sayem314, #14474)

In my opinion 2. is still a valid request even in 2017. Quite a few emojis are not in the basic multilingual plane (BMP) of Unicode. In other words, applications should support at least UCS-4 (UTF-32) to handle them correctly. Besides Android, Konsole/QTerminal don't work well, either. They use Qt's QString, which are UCS-2 internally.

Update: iTerm2 goes crazy with some emojis, too 😆

@yan12125 commented on GitHub (Oct 12, 2017): Other similar ideas: 1. Keep ampersands and parentheses (@active8, #4549) 2. Strip emojis (@sayem314, #14474) In my opinion 2. is still a valid request even in 2017. Quite a few emojis are not in the basic multilingual plane (BMP) of Unicode. In other words, applications should support at least UCS-4 (UTF-32) to handle them correctly. Besides Android, Konsole/QTerminal don't work well, either. They use Qt's QString, which are UCS-2 internally. Update: iTerm2 goes crazy with some emojis, too 😆
Author
Owner

@vlakoff commented on GitHub (Aug 29, 2018):

On some videos I'm downloading, I encounter emojis (example), or UTF-8 accents (example).

Because of these characters, I'm encountering the following issues:

  • Error when copying the file to my phone (USB cable)
  • "File not found" errors on some Batch scripts

So, I searched a bit and found the --restrict-filenames option. But it unnecessarily replaces spaces with underscores, which is much more visually bloated. It also removes valid ASCII characters such as &, !, etc.

I suggest adding a --ascii-filenames option, that would just produce fully ASCII filenames, without doing any other transformation.

@vlakoff commented on GitHub (Aug 29, 2018): On some videos I'm downloading, I encounter emojis ([example](https://www.youtube.com/watch?v=8voV7_gBMFU)), or UTF-8 accents ([example](https://www.youtube.com/watch?v=TNxIEIiFaCo)). Because of these characters, I'm encountering the following issues: * Error when copying the file to my phone (USB cable) * "File not found" errors on some Batch scripts So, I searched a bit and found the `--restrict-filenames` option. But it unnecessarily replaces spaces with underscores, which is much more visually bloated. It also removes valid ASCII characters such as `&`, `!`, etc. I suggest adding a `--ascii-filenames` option, that would just produce fully ASCII filenames, without doing any other transformation.
Author
Owner

@dantheman213 commented on GitHub (May 21, 2019):

Any update on this issue?

@dantheman213 commented on GitHub (May 21, 2019): Any update on this issue?
Author
Owner

@Kochise commented on GitHub (Oct 31, 2019):

Files are not downloaded if there is a # in the filename. Could it be possible to have an option just to restrict illegal characters ?

@Kochise commented on GitHub (Oct 31, 2019): Files are not downloaded if there is a # in the filename. Could it be possible to have an option just to restrict illegal characters ?
Author
Owner

@frgmntdmmrs commented on GitHub (Dec 12, 2019):

Chipping in to also say that I'd love to see this feature to strip down emojis or add filters to output file names. Sometimes clips being downloaded have emojis in their titles and it tends to cause weird issues with other software, such as not detecting the file or crashing.

@frgmntdmmrs commented on GitHub (Dec 12, 2019): Chipping in to also say that I'd love to see this feature to strip down emojis or add filters to output file names. Sometimes clips being downloaded have emojis in their titles and it tends to cause weird issues with other software, such as not detecting the file or crashing.
Author
Owner

@shillshocked commented on GitHub (Jan 2, 2020):

How is this issue solvable?

@shillshocked commented on GitHub (Jan 2, 2020): How is this issue solvable?
Author
Owner

@nestukh commented on GitHub (Apr 3, 2020):

to strip most emojis from the final filename, add this --exec switch (it's one single line):

--exec "python -B -c \"\$(printf %b 'import os,sys,re,shutil; shutil.move(sys.argv[1],re.sub(re.compile(\"([\\U0001F1E0-\\U0001F1FF,\\U0001F300-\\U0001F5FF,\\U0001F600-\\U0001F64F,\\U0001F680-\\U0001F6FF,\\U0001F700-\\U0001F77F,\\U0001F780-\\U0001F7FF,\\U0001F800-\\U0001F8FF,\\U0001F900-\\U0001F9FF,\\U0001FA00-\\U0001FA6F,\\U0001FA70-\\U0001FAFF,\\U00002702-\\U000027B0,\\U000024C2-\\U0001F251])\"), r\"\", sys.argv[1]))')\" {}"

this was tested on the penguinean operating system.
Other possible emojis on https://en.wikipedia.org/wiki/Emoji#Unicode_blocks

Credits:
Code derived from and original work by: https://gist.github.com/Alex-Just/e86110836f3f93fe7932290526529cd1#gistcomment-3208085

extra:
--exec uses sh. Using bash in the command line for the filename in $VARIABLE (again it's one single line):

set +H; python -B -c "$(printf %b 'import os,sys,re,shutil; shutil.move(sys.argv[1],re.sub(re.compile("([\\U0001F1E0-\\U0001F1FF,\\U0001F300-\\U0001F5FF,\\U0001F600-\\U0001F64F,\\U0001F680-\\U0001F6FF,\\U0001F700-\\U0001F77F,\\U0001F780-\\U0001F7FF,\\U0001F800-\\U0001F8FF,\\U0001F900-\\U0001F9FF,\\U0001FA00-\\U0001FA6F,\\U0001FA70-\\U0001FAFF,\\U00002702-\\U000027B0,\\U000024C2-\\U0001F251])"), r"", sys.argv[1]))')" "$VARIABLE"; set -H

@nestukh commented on GitHub (Apr 3, 2020): to strip most emojis from the final filename, add this **--exec** switch (it's one single line): `--exec "python -B -c \"\$(printf %b 'import os,sys,re,shutil; shutil.move(sys.argv[1],re.sub(re.compile(\"([\\U0001F1E0-\\U0001F1FF,\\U0001F300-\\U0001F5FF,\\U0001F600-\\U0001F64F,\\U0001F680-\\U0001F6FF,\\U0001F700-\\U0001F77F,\\U0001F780-\\U0001F7FF,\\U0001F800-\\U0001F8FF,\\U0001F900-\\U0001F9FF,\\U0001FA00-\\U0001FA6F,\\U0001FA70-\\U0001FAFF,\\U00002702-\\U000027B0,\\U000024C2-\\U0001F251])\"), r\"\", sys.argv[1]))')\" {}"` this was tested on the penguinean operating system. Other possible emojis on https://en.wikipedia.org/wiki/Emoji#Unicode_blocks Credits: Code derived from and original work by: https://gist.github.com/Alex-Just/e86110836f3f93fe7932290526529cd1#gistcomment-3208085 extra: **--exec** uses `sh`. Using `bash` in the command line for the filename in `$VARIABLE` (again it's one single line): `set +H; python -B -c "$(printf %b 'import os,sys,re,shutil; shutil.move(sys.argv[1],re.sub(re.compile("([\\U0001F1E0-\\U0001F1FF,\\U0001F300-\\U0001F5FF,\\U0001F600-\\U0001F64F,\\U0001F680-\\U0001F6FF,\\U0001F700-\\U0001F77F,\\U0001F780-\\U0001F7FF,\\U0001F800-\\U0001F8FF,\\U0001F900-\\U0001F9FF,\\U0001FA00-\\U0001FA6F,\\U0001FA70-\\U0001FAFF,\\U00002702-\\U000027B0,\\U000024C2-\\U0001F251])"), r"", sys.argv[1]))')" "$VARIABLE"; set -H`
Author
Owner

@nestukh commented on GitHub (Apr 3, 2020):

P.S.
in the case someone uses %(upload_date)s in the output template as well, e.g.
--output "%(upload_date)s %(uploader)s - %(title)s [%(id)s].%(ext)s"
and would like to convert from YYYYMMDD (20200403) to a more readable format like YYYY-MM-DD (2020-04-03), the correct --exec is:

--exec "python -B -c \"\$(printf %b 'import os,sys,re,shutil; shutil.move(sys.argv[1],re.sub(r\"^(\d{4})(\d{2})(\d{2})\", r\"\\\1-\\\2-\\\3\",re.sub(re.compile(\"([\\U0001F1E0-\\U0001F1FF,\\U0001F300-\\U0001F5FF,\\U0001F600-\\U0001F64F,\\U0001F680-\\U0001F6FF,\\U0001F700-\\U0001F77F,\\U0001F780-\\U0001F7FF,\\U0001F800-\\U0001F8FF,\\U0001F900-\\U0001F9FF,\\U0001FA00-\\U0001FA6F,\\U0001FA70-\\U0001FAFF,\\U00002702-\\U000027B0,\\U000024C2-\\U0001F251])\"), r\"\", sys.argv[1])))')\" {}"

the ^ is for making sure that the search pattern starts at the beginning of the filename. Remove it if your template differs from the example above.

It's a simple substitution that can be done with tools like rename (in place of sed) too:
--exec "rename 's/^(\d{4})(\d{2})(\d{2})/\$1-\$2-\$3/' {}"
but you cannot execute multiple --exec with the {} wildcard for changing the same file multiple times. Also, rename is not installed by default, while the python code above can take advantage of the same local virtualenv where youtube_dl is installed in.

@nestukh commented on GitHub (Apr 3, 2020): P.S. in the case someone uses `%(upload_date)s` in the output template as well, e.g. `--output "%(upload_date)s %(uploader)s - %(title)s [%(id)s].%(ext)s"` and would like to convert from _YYYYMMDD_ (_20200403_) to a more readable format like _YYYY-MM-DD_ (_2020-04-03_), the correct **--exec** is: `--exec "python -B -c \"\$(printf %b 'import os,sys,re,shutil; shutil.move(sys.argv[1],re.sub(r\"^(\d{4})(\d{2})(\d{2})\", r\"\\\1-\\\2-\\\3\",re.sub(re.compile(\"([\\U0001F1E0-\\U0001F1FF,\\U0001F300-\\U0001F5FF,\\U0001F600-\\U0001F64F,\\U0001F680-\\U0001F6FF,\\U0001F700-\\U0001F77F,\\U0001F780-\\U0001F7FF,\\U0001F800-\\U0001F8FF,\\U0001F900-\\U0001F9FF,\\U0001FA00-\\U0001FA6F,\\U0001FA70-\\U0001FAFF,\\U00002702-\\U000027B0,\\U000024C2-\\U0001F251])\"), r\"\", sys.argv[1])))')\" {}"` the `^` is for making sure that the search pattern starts at the beginning of the filename. Remove it if your template differs from the example above. It's a simple substitution that can be done with tools like [`rename`](https://packages.debian.org/stable/rename) (in place of `sed`) too: `--exec "rename 's/^(\d{4})(\d{2})(\d{2})/\$1-\$2-\$3/' {}"` but you cannot execute multiple **--exec** with the `{}` wildcard for changing the same file multiple times. Also, `rename` is not installed by default, while the python code above can take advantage of the same local `virtualenv` where `youtube_dl` is installed in.
Author
Owner

@lepermagpie commented on GitHub (Jun 8, 2020):

I appreciate the comprehensive scripts nestukh. Can this be easily replicated on Windows too, as in just replacing the appropriate syntaxes from the original script?

@lepermagpie commented on GitHub (Jun 8, 2020): I appreciate the comprehensive scripts nestukh. Can this be easily replicated on Windows too, as in just replacing the appropriate syntaxes from the original script?
Author
Owner

@nestukh commented on GitHub (Jun 9, 2020):

Probably yes, also you will need no changes under WSL (GNU/Linux subsystem for Windows 10), MSYS2, Cygwin or others minimal GNU/Linux layer implementations on Windows.

@nestukh commented on GitHub (Jun 9, 2020): Probably yes, also you will need no changes under WSL (GNU/Linux subsystem for Windows 10), MSYS2, Cygwin or others minimal GNU/Linux layer implementations on Windows.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/youtube-dl-ytdl-org#4099
No description provided.