JSON can not be parsed for Reddit videos #25633

Open
opened 2026-02-21 13:53:37 -05:00 by deekerman · 5 comments
Owner

Originally created by @rasoolZero on GitHub (Dec 13, 2022).

Checklist

  • I'm reporting a broken site support
  • I've verified that I'm running youtube-dl version 2021.12.17
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar issues including closed ones

Verbose log

[debug] Encodings: locale cp1252, fs utf-8, out utf-8, pref cp1252
[debug] youtube-dl version 2021.12.17
[debug] Python version 3.10.3 (CPython) - Windows-10-10.0.19042-SP0
[debug] exe versions: ffmpeg 4.3.1, ffprobe 4.3.1
[debug] Proxy map: {}
ERROR: zfj2pv: Failed to parse JSON  (caused by JSONDecodeError('Expecting value: line 2 column 5 (char 5)')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

Description

I'm trying to get the duration of a video in reddit.com using youtube-dl but the json can not be parsed properly
it works with command line executable, but my main goal is to get the duration in a python script and it completely crashes

Traceback (most recent call last):
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 906, in _parse_json
    return json.loads(json_string)
  File "C:\Python310\lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "C:\Python310\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Python310\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 5 (char 5)
Traceback (most recent call last):
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 906, in _parse_json
    return json.loads(json_string)
  File "C:\Python310\lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "C:\Python310\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Python310\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 5 (char 5)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 815, in wrapper
    return func(self, *args, **kwargs)
  File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 836, in __extract_info
    ie_result = ie.extract(url)
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 534, in extract
    ie_result = self._real_extract(url)
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\reddit.py", line 106, in _real_extract
    data = self._download_json(
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 895, in _download_json
    res = self._download_json_handle(
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 881, in _download_json_handle
    return self._parse_json(
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 910, in _parse_json
    raise ExtractorError(errmsg, cause=ve)
youtube_dl.utils.ExtractorError: zfj2pv: Failed to parse JSON  (caused by JSONDecodeError('Expecting value: line 2 column 5 (char 5)')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

Traceback (most recent call last):
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 906, in _parse_json
    return json.loads(json_string)
  File "C:\Python310\lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "C:\Python310\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Python310\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 5 (char 5)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 815, in wrapper
    return func(self, *args, **kwargs)
  File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 836, in __extract_info
    ie_result = ie.extract(url)
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 534, in extract
    ie_result = self._real_extract(url)
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\reddit.py", line 106, in _real_extract
    data = self._download_json(
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 895, in _download_json
    res = self._download_json_handle(
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 881, in _download_json_handle
    return self._parse_json(
  File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 910, in _parse_json
    raise ExtractorError(errmsg, cause=ve)
youtube_dl.utils.ExtractorError: zfj2pv: Failed to parse JSON  (caused by JSONDecodeError('Expecting value: line 2 column 5 (char 5)')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 808, in extract_info
    return self.__extract_info(url, ie, download, extra_info, process)
  File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 824, in wrapper
    self.report_error(compat_str(e), e.format_traceback())
  File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 628, in report_error
    self.trouble(error_message, tb)
  File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 598, in trouble
    raise DownloadError(message, exc_info)
youtube_dl.utils.DownloadError: ERROR: zfj2pv: Failed to parse JSON  (caused by JSONDecodeError('Expecting value: line 2 column 5 (char 5)')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Originally created by @rasoolZero on GitHub (Dec 13, 2022). <!-- ###################################################################### WARNING! IGNORING THE FOLLOWING TEMPLATE WILL RESULT IN ISSUE CLOSED AS INCOMPLETE ###################################################################### --> ## Checklist <!-- Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.12.17. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape. - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates. - Finally, put x into all relevant boxes (like this [x]) --> - [x] I'm reporting a broken site support - [x] I've verified that I'm running youtube-dl version **2021.12.17** - [x] I've checked that all provided URLs are alive and playable in a browser - [x] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [x] I've searched the bugtracker for similar issues including closed ones ## Verbose log <!-- Provide the complete verbose output of youtube-dl that clearly demonstrates the problem. Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <your command line>`), copy the WHOLE output and insert it below. It should look similar to this: [debug] System config: [] [debug] User config: [] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] youtube-dl version 2021.12.17 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] Proxy map: {} <more lines> --> ``` [debug] Encodings: locale cp1252, fs utf-8, out utf-8, pref cp1252 [debug] youtube-dl version 2021.12.17 [debug] Python version 3.10.3 (CPython) - Windows-10-10.0.19042-SP0 [debug] exe versions: ffmpeg 4.3.1, ffprobe 4.3.1 [debug] Proxy map: {} ERROR: zfj2pv: Failed to parse JSON (caused by JSONDecodeError('Expecting value: line 2 column 5 (char 5)')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. ``` ## Description <!-- Provide an explanation of your issue in an arbitrary form. Provide any additional information, suggested solution and as much context and examples as possible. If work on your issue requires account credentials please provide them or explain how one can obtain them. --> I'm trying to get the duration of a video in reddit.com using youtube-dl but the json can not be parsed properly it works with command line executable, but my main goal is to get the duration in a python script and it completely crashes ``` Traceback (most recent call last): File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 906, in _parse_json return json.loads(json_string) File "C:\Python310\lib\json\__init__.py", line 346, in loads return _default_decoder.decode(s) File "C:\Python310\lib\json\decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "C:\Python310\lib\json\decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 2 column 5 (char 5) Traceback (most recent call last): File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 906, in _parse_json return json.loads(json_string) File "C:\Python310\lib\json\__init__.py", line 346, in loads return _default_decoder.decode(s) File "C:\Python310\lib\json\decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "C:\Python310\lib\json\decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 2 column 5 (char 5) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 815, in wrapper return func(self, *args, **kwargs) File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 836, in __extract_info ie_result = ie.extract(url) File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 534, in extract ie_result = self._real_extract(url) File "C:\Python310\lib\site-packages\youtube_dl\extractor\reddit.py", line 106, in _real_extract data = self._download_json( File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 895, in _download_json res = self._download_json_handle( File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 881, in _download_json_handle return self._parse_json( File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 910, in _parse_json raise ExtractorError(errmsg, cause=ve) youtube_dl.utils.ExtractorError: zfj2pv: Failed to parse JSON (caused by JSONDecodeError('Expecting value: line 2 column 5 (char 5)')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. Traceback (most recent call last): File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 906, in _parse_json return json.loads(json_string) File "C:\Python310\lib\json\__init__.py", line 346, in loads return _default_decoder.decode(s) File "C:\Python310\lib\json\decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "C:\Python310\lib\json\decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 2 column 5 (char 5) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 815, in wrapper return func(self, *args, **kwargs) File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 836, in __extract_info ie_result = ie.extract(url) File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 534, in extract ie_result = self._real_extract(url) File "C:\Python310\lib\site-packages\youtube_dl\extractor\reddit.py", line 106, in _real_extract data = self._download_json( File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 895, in _download_json res = self._download_json_handle( File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 881, in _download_json_handle return self._parse_json( File "C:\Python310\lib\site-packages\youtube_dl\extractor\common.py", line 910, in _parse_json raise ExtractorError(errmsg, cause=ve) youtube_dl.utils.ExtractorError: zfj2pv: Failed to parse JSON (caused by JSONDecodeError('Expecting value: line 2 column 5 (char 5)')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<stdin>", line 2, in <module> File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 808, in extract_info return self.__extract_info(url, ie, download, extra_info, process) File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 824, in wrapper self.report_error(compat_str(e), e.format_traceback()) File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 628, in report_error self.trouble(error_message, tb) File "C:\Python310\lib\site-packages\youtube_dl\YoutubeDL.py", line 598, in trouble raise DownloadError(message, exc_info) youtube_dl.utils.DownloadError: ERROR: zfj2pv: Failed to parse JSON (caused by JSONDecodeError('Expecting value: line 2 column 5 (char 5)')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. ```
Author
Owner

@dirkf commented on GitHub (Dec 14, 2022):

We'll need to know the video URL to debug the problem. Update the description with the full verbose log.

@dirkf commented on GitHub (Dec 14, 2022): We'll need to know the video URL to debug the problem. Update the description with the **full verbose log**.
Author
Owner

@rasoolZero commented on GitHub (Dec 15, 2022):

We'll need to know the video URL to debug the problem. Update the description with the full verbose log.

I actually included the whole verbose log, it's from the embedded python library
here's the link I used to test

and the code I used


ydl_opts = {"no_warning":True,
				"forceduration":True,"quiet":True,"verbose":True}
				
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
    dictMeta = ydl.extract_info("https://new.reddit.com/r/videomemes/comments/zfj2pv/is_this_w_rizz/",download=False)
@rasoolZero commented on GitHub (Dec 15, 2022): > We'll need to know the video URL to debug the problem. Update the description with the **full verbose log**. I actually included the whole verbose log, it's from the embedded python library [here's the link I used to test](https://new.reddit.com/r/videomemes/comments/zfj2pv/is_this_w_rizz/) and the code I used ```import youtube_dl ydl_opts = {"no_warning":True, "forceduration":True,"quiet":True,"verbose":True} with youtube_dl.YoutubeDL(ydl_opts) as ydl: dictMeta = ydl.extract_info("https://new.reddit.com/r/videomemes/comments/zfj2pv/is_this_w_rizz/",download=False) ```
Author
Owner

@dirkf commented on GitHub (Dec 15, 2022):

If custom code isn't working it's probably better to report the problem using the yt-dl main program, so in your case:

python -m youtube-dl -v [other options] 'https://new.reddit.com/r/videomemes/comments/zfj2pv/is_this_w_rizz/'

The lack of the normal program diagnostics is why I asked for the full log. At least it would have been helpful to mention how yt-dl was being invoked.

Anyhow, the main program reproduces the problem but https://old.reddit.com/r/videomemes/comments/zfj2pv/is_this_w_rizz/ works (so, just like any normal use of Reddit). So does https://www.reddit.com/r/videomemes/comments/zfj2pv/is_this_w_rizz/.

[Update]
The problem is that the extractor fetches the JSON from the matched URL up to the end of the URL path component, without any trailing / or other URL path terminator, and with /.json appended. Then new.reddit.com/.../.json URL is returning a page of HTML, whereas fetching new.reddit.com/...//.json with wget does return the expected JSON. With old or www in the host field, the JSON is returned for both single and double /.

@dirkf commented on GitHub (Dec 15, 2022): If custom code isn't working it's probably better to report the problem using the yt-dl main program, so in your case: ``` python -m youtube-dl -v [other options] 'https://new.reddit.com/r/videomemes/comments/zfj2pv/is_this_w_rizz/' ``` The lack of the normal program diagnostics is why I asked for the full log. At least it would have been helpful to mention how yt-dl was being invoked. Anyhow, the main program reproduces the problem but https://old.reddit.com/r/videomemes/comments/zfj2pv/is_this_w_rizz/ works (so, just like any normal use of Reddit). So does https://www.reddit.com/r/videomemes/comments/zfj2pv/is_this_w_rizz/. [Update] The problem is that the extractor fetches the JSON from the matched URL up to the end of the URL path component, without any trailing `/` or other URL path terminator, and with `/.json` appended. Then `new.reddit.com/.../.json` URL is returning a page of HTML, whereas fetching `new.reddit.com/...//.json` with _wget_ does return the expected JSON. With `old` or `www` in the host field, the JSON is returned for both single and double `/`.
Author
Owner

@dirkf commented on GitHub (Dec 16, 2022):

As the extractor needs a tweak to support new.reddit.com properly, I'll keep this open. We may as well normalise the JSON look-up to www.reddit.com, unless anyone has evidence that (www|new|old).reddit.com present different media and/or metadata.

@dirkf commented on GitHub (Dec 16, 2022): As the extractor needs a tweak to support new.reddit.com properly, I'll keep this open. We may as well normalise the JSON look-up to www.reddit.com, unless anyone has evidence that (www|new|old).reddit.com present different media and/or metadata.
Author
Owner

@dirkf commented on GitHub (Jan 28, 2023):

--- old/youtube_dl/extractor/reddit.py
+++ new/youtube_dl/extractor/reddit.py
@@ -50,7 +50,7 @@
 
 
 class RedditRIE(InfoExtractor):
-    _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?reddit\.com/r/[^/]+/comments/(?P<id>[^/?#&]+))'
+    _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<burl>reddit\.com/r/[^/]+/comments/(?P<id>[^/?#&]+)))'
     _TESTS = [{
         'url': 'https://www.reddit.com/r/videos/comments/6rrwyj/that_small_heart_attack/',
         'info_dict': {
@@ -99,7 +99,8 @@
 
     def _real_extract(self, url):
         mobj = re.match(self._VALID_URL, url)
-        url, video_id = mobj.group('url', 'id')
+        url, video_id = mobj.group('burl', 'id')
+        url = 'https://www.' + url
 
         video_id = self._match_id(url)
 
@dirkf commented on GitHub (Jan 28, 2023): ```diff --- old/youtube_dl/extractor/reddit.py +++ new/youtube_dl/extractor/reddit.py @@ -50,7 +50,7 @@ class RedditRIE(InfoExtractor): - _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?reddit\.com/r/[^/]+/comments/(?P<id>[^/?#&]+))' + _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<burl>reddit\.com/r/[^/]+/comments/(?P<id>[^/?#&]+)))' _TESTS = [{ 'url': 'https://www.reddit.com/r/videos/comments/6rrwyj/that_small_heart_attack/', 'info_dict': { @@ -99,7 +99,8 @@ def _real_extract(self, url): mobj = re.match(self._VALID_URL, url) - url, video_id = mobj.group('url', 'id') + url, video_id = mobj.group('burl', 'id') + url = 'https://www.' + url video_id = self._match_id(url) ```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/youtube-dl-ytdl-org#25633
No description provided.