BFM business video extraction fails #26713

Open
opened 2026-02-21 14:27:06 -05:00 by deekerman · 4 comments
Owner

Originally created by @theedge456 on GitHub (Oct 18, 2023).

Checklist

  • [ x] I'm reporting a broken site support issue
  • [ x] I've verified that I'm running youtube-dl version 2023.10.13
  • [ x] I've checked that all provided URLs are alive and playable in a browser
  • [ x] I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • [ x] I've searched the bugtracker for similar bug reports including closed ones
  • [ x] I've read bugs section in FAQ

Verbose log

[debug] Command-line config: ['https://www.bfmtv.com/economie/replay-emissions/les-experts/les-experts-transition-verte-qui-doit-payer-13-10_VN-202310130321.html', '-v']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version stable@2023.10.13 [b634ba742] (zip)
[debug] Python 3.11.2 (CPython x86_64 64bit) - Linux-6.1.57-x86_64-with-glibc2.36 (OpenSSL 3.0.11 19 Sep 2023, glibc 2.36)
[debug] exe versions: ffmpeg 5.1.3-1 (setts), ffprobe 5.1.3-1
[debug] Optional libraries: Cryptodome-3.11.0, brotli-1.0.9, certifi-2022.09.24, mutagen-1.46.0, pyxattr-0.8.1, requests-2.28.1, sqlite3-3.40.1, urllib3-1.26.12
[debug] Proxy map: {}
[debug] Request Handlers: urllib
[debug] Loaded 1890 extractors
[bfmtv] Extracting URL: https://www.bfmtv.com/economie/replay-emissions/les-experts/les-experts-transition-verte-qui-doit-payer-13-10_VN-202310130321.html
[bfmtv] 202310130321: Downloading webpage
ERROR: [bfmtv] 202310130321: Unable to extract video block; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
  File "/usr/bin/youtube-dl/yt_dlp/extractor/common.py", line 715, in extract
    ie_result = self._real_extract(url)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/bin/youtube-dl/yt_dlp/extractor/bfmtv.py", line 43, in _real_extract
    video_block = extract_attributes(self._search_regex(
                                     ^^^^^^^^^^^^^^^^^^^
  File "/usr/bin/youtube-dl/yt_dlp/extractor/common.py", line 1263, in _search_regex
    raise RegexNotFoundError('Unable to extract %s' % _name)

Description

The extraction fails since October 17th, 2023

Originally created by @theedge456 on GitHub (Oct 18, 2023). <!-- ###################################################################### WARNING! IGNORING THE FOLLOWING TEMPLATE WILL RESULT IN ISSUE CLOSED AS INCOMPLETE ###################################################################### --> ## Checklist <!-- Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.12.17. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape. - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates. - Read bugs section in FAQ: http://yt-dl.org/reporting - Finally, put x into all relevant boxes (like this [x]) --> - [ x] I'm reporting a broken site support issue - [ x] I've verified that I'm running youtube-dl version **2023.10.13** - [ x] I've checked that all provided URLs are alive and playable in a browser - [ x] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [ x] I've searched the bugtracker for similar bug reports including closed ones - [ x] I've read bugs section in FAQ ## Verbose log <!-- Provide the complete verbose output of youtube-dl that clearly demonstrates the problem. Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <your command line>`), copy the WHOLE output and insert it below. It should look similar to this: [debug] System config: [] [debug] User config: [] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] youtube-dl version 2021.12.17 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] Proxy map: {} <more lines> --> ``` [debug] Command-line config: ['https://www.bfmtv.com/economie/replay-emissions/les-experts/les-experts-transition-verte-qui-doit-payer-13-10_VN-202310130321.html', '-v'] [debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8 [debug] yt-dlp version stable@2023.10.13 [b634ba742] (zip) [debug] Python 3.11.2 (CPython x86_64 64bit) - Linux-6.1.57-x86_64-with-glibc2.36 (OpenSSL 3.0.11 19 Sep 2023, glibc 2.36) [debug] exe versions: ffmpeg 5.1.3-1 (setts), ffprobe 5.1.3-1 [debug] Optional libraries: Cryptodome-3.11.0, brotli-1.0.9, certifi-2022.09.24, mutagen-1.46.0, pyxattr-0.8.1, requests-2.28.1, sqlite3-3.40.1, urllib3-1.26.12 [debug] Proxy map: {} [debug] Request Handlers: urllib [debug] Loaded 1890 extractors [bfmtv] Extracting URL: https://www.bfmtv.com/economie/replay-emissions/les-experts/les-experts-transition-verte-qui-doit-payer-13-10_VN-202310130321.html [bfmtv] 202310130321: Downloading webpage ERROR: [bfmtv] 202310130321: Unable to extract video block; please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using yt-dlp -U File "/usr/bin/youtube-dl/yt_dlp/extractor/common.py", line 715, in extract ie_result = self._real_extract(url) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/bin/youtube-dl/yt_dlp/extractor/bfmtv.py", line 43, in _real_extract video_block = extract_attributes(self._search_regex( ^^^^^^^^^^^^^^^^^^^ File "/usr/bin/youtube-dl/yt_dlp/extractor/common.py", line 1263, in _search_regex raise RegexNotFoundError('Unable to extract %s' % _name) ``` ## Description <!-- Provide an explanation of your issue in an arbitrary form. Please make sure the description is worded well enough to be understood, see https://github.com/ytdl-org/youtube-dl#is-the-description-of-the-issue-itself-sufficient. Provide any additional information, suggested solution and as much context and examples as possible. If work on your issue requires account credentials please provide them or explain how one can obtain them. --> The extraction fails since October 17th, 2023
Author
Owner

@dirkf commented on GitHub (Oct 19, 2023):

The pattern used by the extractor to find the Brightcove video id is too restrictive:

--- old/youtube-dl/youtube_dl/extractor/bfmtv.py
+++ new/youtube-dl/youtube_dl/extractor/bfmtv.py
@@ -10,7 +10,7 @@
 class BFMTVBaseIE(InfoExtractor):
     _VALID_URL_BASE = r'https?://(?:www\.)?bfmtv\.com/'
     _VALID_URL_TMPL = _VALID_URL_BASE + r'(?:[^/]+/)*[^/?&#]+_%s[A-Z]-(?P<id>\d{12})\.html'
-    _VIDEO_BLOCK_REGEX = r'(<div[^>]+class="video_block"[^>]*>)'
+    _VIDEO_BLOCK_REGEX = r'(<div\s[^>]*\bclass\s*=\s*["\'](?:[\S]\s+)*video_block\b[^>]+>)'
     BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/%s_default/index.html?videoId=%s'
 
     def _brightcove_url_result(self, video_id, video_block):

Also there are some improvements from the yt-dlp version to add.

@dirkf commented on GitHub (Oct 19, 2023): The pattern used by the extractor to find the Brightcove video id is too restrictive: ``` --- old/youtube-dl/youtube_dl/extractor/bfmtv.py +++ new/youtube-dl/youtube_dl/extractor/bfmtv.py @@ -10,7 +10,7 @@ class BFMTVBaseIE(InfoExtractor): _VALID_URL_BASE = r'https?://(?:www\.)?bfmtv\.com/' _VALID_URL_TMPL = _VALID_URL_BASE + r'(?:[^/]+/)*[^/?&#]+_%s[A-Z]-(?P<id>\d{12})\.html' - _VIDEO_BLOCK_REGEX = r'(<div[^>]+class="video_block"[^>]*>)' + _VIDEO_BLOCK_REGEX = r'(<div\s[^>]*\bclass\s*=\s*["\'](?:[\S]\s+)*video_block\b[^>]+>)' BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/%s_default/index.html?videoId=%s' def _brightcove_url_result(self, video_id, video_block): ``` Also there are some improvements from the yt-dlp version to add.
Author
Owner

@theedge456 commented on GitHub (Oct 19, 2023):

@dirkf
It's fixed now.
Thanks for the patch

@theedge456 commented on GitHub (Oct 19, 2023): @dirkf It's fixed now. Thanks for the patch
Author
Owner

@dirkf commented on GitHub (Oct 19, 2023):

Thanks. I'll keep it open until the patch is committed.

@dirkf commented on GitHub (Oct 19, 2023): Thanks. I'll keep it open until the patch is committed.
Author
Owner

@mycodedoesnotcompile2 commented on GitHub (Nov 1, 2023):

When this patch will be merged ?

@mycodedoesnotcompile2 commented on GitHub (Nov 1, 2023): When this patch will be merged ?
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/youtube-dl-ytdl-org#26713
No description provided.