Youtube link regex fails for youtu.be links. #37

Closed
opened 2026-02-20 22:23:54 -05:00 by deekerman · 2 comments
Owner

Originally created by @edelooff on GitHub (Jan 9, 2011).

Given the example link "http://youtu.be/watch?v=COiIC3A0ROM"
and the _VALID_URL regex present in the Jan 7 checkout, the groups() results are as follows:
('http://youtu.be/', 'watch')

The fix for the regex is a parenthesis that is placed too late. There should be an extra one after the ".com/" part, closing the optional matching group for youtu.be | \w+.youtube.com.

The regular expression that "works for me" is as follows:
_VALID_URL = r'^((?:https?://)?(?:youtu.be/|(?:\w+.)?youtube(?:-nocookie)?.com/)(?:(?:v/)|(?:(?:watch(?:popup)?(?:.php)?)?(?:?|#!?)(?:.+&)?v=)))?([0-9A-Za-z-]+)(?(1).+)?$'

Originally created by @edelooff on GitHub (Jan 9, 2011). Given the example link "http://youtu.be/watch?v=COiIC3A0ROM" and the _VALID_URL regex present in the Jan 7 checkout, the groups() results are as follows: ('http://youtu.be/', 'watch') The fix for the regex is a parenthesis that is placed too late. There should be an extra one after the ".com/" part, closing the optional matching group for youtu.be | \w+.youtube.com. The regular expression that "works for me" is as follows: _VALID_URL = r'^((?:https?://)?(?:youtu.be/|(?:\w+.)?youtube(?:-nocookie)?.com/)(?:(?:v/)|(?:(?:watch(?:_popup)?(?:.php)?)?(?:\?|#!?)(?:.+&)?v=)))?([0-9A-Za-z_-]+)(?(1).+)?$'
deekerman 2026-02-20 22:23:54 -05:00
  • closed this issue
  • added the
    bug
    label
Author
Owner

@edelooff commented on GitHub (Jan 9, 2011):

An annotated version of the above, which is not as verbose as it could be, but attempts some explanation of parts:

_VALID_URL = re.compile(r"""
^( # Start matching an optional URL.
(?:https?://)? # The scheme may be 'http', 'https' or missing.
(?:youtu.be/| # Domain can be youtu.be ...
(?:\w+.)?youtube # ... or any youtube subdomain ...
(?:-nocookie)?.com/) # with or without -nocookie in the domain name.
(?:(?:v/)| # The path might start with 'v', ...
(?:(?:watch(?:_popup)? # ... 'watch', with or without 'popup' ...
(?:.php)?)? # ... and might have a trailing '.php'.
(?:?|#!?) # Info may be in the fragment or in the path ...
(?:.+&)? # ... and may contain leading query arguments.
v=)))? # but it WILL contain a 'v=' for the video ID!
([0-9A-Za-z
-]+) # Capture the video_id (group #2).
(?(1).+)?$ # If the video ID was in an URL, capture the rest.
""", re.VERBOSE)

@edelooff commented on GitHub (Jan 9, 2011): An annotated version of the above, which is not as verbose as it could be, but attempts some explanation of parts: <code> _VALID_URL = re.compile(r""" ^( # Start matching an optional URL. (?:https?://)? # The scheme may be 'http', 'https' or missing. (?:youtu.be/| # Domain can be youtu.be ... (?:\w+.)?youtube # ... or any youtube subdomain ... (?:-nocookie)?.com/) # with or without -nocookie in the domain name. (?:(?:v/)| # The path might start with 'v', ... (?:(?:watch(?:_popup)? # ... 'watch', with or without '_popup' ... (?:.php)?)? # ... and might have a trailing '.php'. (?:\?|#!?) # Info may be in the fragment or in the path ... (?:.+&)? # ... and may contain leading query arguments. v=)))? # but it WILL contain a 'v=' for the video ID! ([0-9A-Za-z_-]+) # Capture the video_id (group #2). (?(1).+)?$ # If the video ID was in an URL, capture the rest. """, re.VERBOSE) </code>
Author
Owner

@rg3 commented on GitHub (Jan 10, 2011):

Fixed. Thanks.

@rg3 commented on GitHub (Jan 10, 2011): Fixed. Thanks.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/youtube-dl-ytdl-org#37
No description provided.