Tiktok not get video url sometimes #24697

Open
opened 2026-02-21 13:26:59 -05:00 by deekerman · 18 comments
Owner

Originally created by @TechComet on GitHub (Nov 20, 2021).

Checklist

  • I'm reporting a broken site support = yes
  • I've verified that I'm running youtube-dl version 2021.06.06 = yes
  • I've checked that all provided URLs are alive and playable in a browser = yes
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped = yes
  • I've searched the bugtracker for similar issues including closed ones = yes

Verbose log

youtube-dl -g 'https://www.tiktok.com/@aamora_3mk/video/7028702876205632773' -v
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-g', 'https://www.tiktok.com/@aamora_3mk/video/7028702876205632773', '-v']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.06.06
[debug] Python version 3.8.10 (CPython) - Linux-5.11.0-40-generic-x86_64-with-glibc2.29
[debug] exe versions: ffmpeg 4.2.4, ffprobe 4.2.4
[debug] Proxy map: {}
ERROR: Unable to extract data; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 815, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 836, in __extract_info
    ie_result = ie.extract(url)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 534, in extract
    ie_result = self._real_extract(url)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/tiktok.py", line 110, in _real_extract
    page_props = self._parse_json(self._search_regex(
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 1012, in _search_regex
    raise RegexNotFoundError('Unable to extract %s' % _name)
youtube_dl.utils.RegexNotFoundError: Unable to extract data; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

Description

Sometime it's works but sometime not work!

Originally created by @TechComet on GitHub (Nov 20, 2021). <!-- ###################################################################### WARNING! IGNORING THE FOLLOWING TEMPLATE WILL RESULT IN ISSUE CLOSED AS INCOMPLETE ###################################################################### --> ## Checklist <!-- Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.06.06. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape. - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates. - Finally, put x into all relevant boxes (like this [x]) --> - [x] I'm reporting a broken site support = `yes` - [x] I've verified that I'm running youtube-dl version **2021.06.06** = `yes` - [x] I've checked that all provided URLs are alive and playable in a browser = `yes` - [x] I've checked that all URLs and arguments with special characters are properly quoted or escaped = `yes` - [x] I've searched the bugtracker for similar issues including closed ones = `yes` ## Verbose log ``` youtube-dl -g 'https://www.tiktok.com/@aamora_3mk/video/7028702876205632773' -v [debug] System config: [] [debug] User config: [] [debug] Custom config: [] [debug] Command-line args: ['-g', 'https://www.tiktok.com/@aamora_3mk/video/7028702876205632773', '-v'] [debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8 [debug] youtube-dl version 2021.06.06 [debug] Python version 3.8.10 (CPython) - Linux-5.11.0-40-generic-x86_64-with-glibc2.29 [debug] exe versions: ffmpeg 4.2.4, ffprobe 4.2.4 [debug] Proxy map: {} ERROR: Unable to extract data; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. Traceback (most recent call last): File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 815, in wrapper return func(self, *args, **kwargs) File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 836, in __extract_info ie_result = ie.extract(url) File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 534, in extract ie_result = self._real_extract(url) File "/usr/local/bin/youtube-dl/youtube_dl/extractor/tiktok.py", line 110, in _real_extract page_props = self._parse_json(self._search_regex( File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 1012, in _search_regex raise RegexNotFoundError('Unable to extract %s' % _name) youtube_dl.utils.RegexNotFoundError: Unable to extract data; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. ``` ## Description <!-- Provide an explanation of your issue in an arbitrary form. Provide any additional information, suggested solution and as much context and examples as possible. If work on your issue requires account credentials please provide them or explain how one can obtain them. --> Sometime it's works but sometime not work!
Author
Owner

@dirkf commented on GitHub (Nov 20, 2021):

It looks like TT is sometimes sending a page with unexpected format, such as an error page. The problem URL was successfully extracted several times when I tested just now.

If you can stimulate the error, use --write-pages to save the downloaded HTML, and we can then analyse it.

@dirkf commented on GitHub (Nov 20, 2021): It looks like TT is sometimes sending a page with unexpected format, such as an error page. The problem URL was successfully extracted several times when I tested just now. If you can stimulate the error, use `--write-pages` to save the downloaded HTML, and we can then analyse it.
Author
Owner

@TechComet commented on GitHub (Nov 20, 2021):

yt-dl 'https://www.tiktok.com/@aamora_3mk/video/7028702876205632773' --write-pages

aamora_3mk_video_7028702876205632773.dump.zip

@TechComet commented on GitHub (Nov 20, 2021): `yt-dl 'https://www.tiktok.com/@aamora_3mk/video/7028702876205632773' --write-pages` [aamora_3mk_video_7028702876205632773.dump.zip](https://github.com/ytdl-org/youtube-dl/files/7575415/7028702876205632773_https_-_www.tiktok.com_%40aamora_3mk_video_7028702876205632773.dump.zip)
Author
Owner

@dirkf commented on GitHub (Nov 20, 2021):

Thanks, that shows a new format, and now I'm seeing it too.

It seems that TT is in the middle of switching its framework from NextJS to Sigi, and the persisted state JSON sent in the page is changing as a result. Instead of a <script> element with id __NEXT_DATA__, we get one with id sigi_persisted_state and JSON with a slightly different structure.

This patch deals with both types of page format:

--- old/youtube-dl/youtube_dl/extractor/tiktok.py
+++ new/youtube-dl/youtube_dl/extractor/tiktok.py
@@ -108,6 +108,15 @@
         video_id = self._match_id(url)
         webpage = self._download_webpage(url, video_id)
         page_props = self._parse_json(self._search_regex(
+            r'''(?s)<script[^>]+\bid=(?P<q>"|'|\b)sigi-persisted-data(?P=q)[^>]+>\s*=\s*(?P<json>{.+?})\s*</script''',
+            webpage, 'sigi data', default='{}', group='json'), video_id)
+        data = try_get(page_props, lambda x: x['ItemModule'][video_id]['video'], dict)
+        if data:
+            data = page_props['ItemModule'][video_id]
+            if data.get('privateItem'):
+                raise ExtractorError('This video is private', expected=True)
+            return self._extract_video(data, video_id)
+        page_props = self._parse_json(self._search_regex(
             r'<script[^>]+\bid=["\']__NEXT_DATA__[^>]+>\s*({.+?})\s*</script',
             webpage, 'data'), video_id)['props']['pageProps']
         data = try_get(page_props, lambda x: x['itemInfo']['itemStruct'], dict)
@dirkf commented on GitHub (Nov 20, 2021): Thanks, that shows a new format, and now I'm seeing it too. It seems that TT is in the middle of switching its framework from NextJS to [Sigi](https://sigi.how/), and the persisted state JSON sent in the page is changing as a result. Instead of a `<script>` element with `id` `__NEXT_DATA__`, we get one with `id` `sigi_persisted_state` and JSON with a slightly different structure. This patch deals with both types of page format: ``` --- old/youtube-dl/youtube_dl/extractor/tiktok.py +++ new/youtube-dl/youtube_dl/extractor/tiktok.py @@ -108,6 +108,15 @@ video_id = self._match_id(url) webpage = self._download_webpage(url, video_id) page_props = self._parse_json(self._search_regex( + r'''(?s)<script[^>]+\bid=(?P<q>"|'|\b)sigi-persisted-data(?P=q)[^>]+>\s*=\s*(?P<json>{.+?})\s*</script''', + webpage, 'sigi data', default='{}', group='json'), video_id) + data = try_get(page_props, lambda x: x['ItemModule'][video_id]['video'], dict) + if data: + data = page_props['ItemModule'][video_id] + if data.get('privateItem'): + raise ExtractorError('This video is private', expected=True) + return self._extract_video(data, video_id) + page_props = self._parse_json(self._search_regex( r'<script[^>]+\bid=["\']__NEXT_DATA__[^>]+>\s*({.+?})\s*</script', webpage, 'data'), video_id)['props']['pageProps'] data = try_get(page_props, lambda x: x['itemInfo']['itemStruct'], dict) ```
Author
Owner

@TechComet commented on GitHub (Nov 21, 2021):

Why not send update to github ytdl-org/youtube-dl?

@TechComet commented on GitHub (Nov 21, 2021): Why not send update to github `ytdl-org/youtube-dl`?
Author
Owner

@TechComet commented on GitHub (Nov 22, 2021):

I try this changes .. but It's not work

@TechComet commented on GitHub (Nov 22, 2021): I try this changes .. but It's not work
Author
Owner

@dirkf commented on GitHub (Nov 23, 2021):

Why not send update to github ytdl-org/youtube-dl?

Actually, @wranai had already posted the PR linked above.

My revised patch:

--- old/youtube-dl/youtube_dl/extractor/tiktok.py
+++ new/youtube-dl/youtube_dl/extractor/tiktok.py
@@ -15,10 +15,11 @@
 
 class TikTokBaseIE(InfoExtractor):
     def _extract_video(self, data, video_id=None):
-        video = data['video']
-        description = str_or_none(try_get(data, lambda x: x['desc']))
-        width = int_or_none(try_get(data, lambda x: video['width']))
-        height = int_or_none(try_get(data, lambda x: video['height']))
+        video = try_get(data, lambda x: x['video'], dict)
+        if not video:
+            return
+        width = int_or_none(video.get('width'))
+        height = int_or_none(video.get('height'))
 
         format_urls = set()
         formats = []
@@ -43,30 +44,32 @@
         thumbnail = url_or_none(video.get('cover'))
         duration = float_or_none(video.get('duration'))
 
-        uploader = try_get(data, lambda x: x['author']['nickname'], compat_str)
-        uploader_id = try_get(data, lambda x: x['author']['id'], compat_str)
+        author = data.get('author')
+        if isinstance(author, dict):
+            uploader_id = author.get('id')
+        else:
+            uploader_id = data.get('authorId')
+            author = data
+        uploader = str_or_none(author.get('nickname'))
 
         timestamp = int_or_none(data.get('createTime'))
 
-        def stats(key):
-            return int_or_none(try_get(
-                data, lambda x: x['stats']['%sCount' % key]))
-
-        view_count = stats('play')
-        like_count = stats('digg')
-        comment_count = stats('comment')
-        repost_count = stats('share')
+        stats = try_get(data, lambda x: x['stats'], dict)
+        view_count, like_count, comment_count, repost_count = [
+            stats and int_or_none(stats.get('%sCount' % key))
+            for key in ('play', 'digg', 'comment', 'share', )]
 
         aweme_id = data.get('id') or video_id
 
         return {
             'id': aweme_id,
+            'display_id': video_id,
             'title': uploader or aweme_id,
-            'description': description,
+            'description': str_or_none(data.get('desc')),
             'thumbnail': thumbnail,
             'duration': duration,
             'uploader': uploader,
-            'uploader_id': uploader_id,
+            'uploader_id': str_or_none(uploader_id),
             'timestamp': timestamp,
             'view_count': view_count,
             'like_count': like_count,
@@ -84,11 +87,11 @@
         'info_dict': {
             'id': '6606727368545406213',
             'ext': 'mp4',
-            'title': 'Zureeal',
+            'title': 'md5:24acc456b62b938a7e2dd88e978b20d9',
             'description': '#bowsette#mario#cosplay#uk#lgbt#gaming#asian#bowsettecosplay',
             'thumbnail': r're:^https?://.*',
             'duration': 15,
-            'uploader': 'Zureeal',
+            'uploader': 'md5:24acc456b62b938a7e2dd88e978b20d9',
             'uploader_id': '188294915489964032',
             'timestamp': 1538248586,
             'upload_date': '20180929',
@@ -108,8 +111,17 @@
         video_id = self._match_id(url)
         webpage = self._download_webpage(url, video_id)
         page_props = self._parse_json(self._search_regex(
-            r'<script[^>]+\bid=["\']__NEXT_DATA__[^>]+>\s*({.+?})\s*</script',
-            webpage, 'data'), video_id)['props']['pageProps']
+            r'''(?s)<script\s[^>]*?\bid\s*=\s*(?P<q>"|'|\b)sigi-persisted-data(?P=q)[^>]*>[^=]*=\s*(?P<json>{.+?})\s*(?:;[^<]+)?</script''',
+            webpage, 'sigi data', default='{}', group='json'), video_id)
+        data = try_get(page_props, lambda x: x['ItemModule'][video_id]['video'], dict)
+        if data:
+            data = page_props['ItemModule'][video_id]
+            if data.get('privateItem'):
+                raise ExtractorError('This video is private', expected=True)
+            return self._extract_video(data, video_id)
+        page_props = self._parse_json(self._search_regex(
+            r'''(?s)<script\s[^>]*?\bid\s*=\s*(?P<q>"|'|\b)__NEXT_DATA__(?P=q)[^>]*>\s*(?P<json>{.+?})\s*</script''',
+            webpage, 'data', group='json'), video_id)['props']['pageProps']
         data = try_get(page_props, lambda x: x['itemInfo']['itemStruct'], dict)
         if not data and page_props.get('statusCode') == 10216:
             raise ExtractorError('This video is private', expected=True)

Apparently some fetch attempts get a page that contains no media information: neither the __NEXT_DATA__ nor the sigi-persisted-data. You just have to try again.

@dirkf commented on GitHub (Nov 23, 2021): > Why not send update to github `ytdl-org/youtube-dl`? Actually, @wranai had already posted the PR linked above. My revised patch: ``` --- old/youtube-dl/youtube_dl/extractor/tiktok.py +++ new/youtube-dl/youtube_dl/extractor/tiktok.py @@ -15,10 +15,11 @@ class TikTokBaseIE(InfoExtractor): def _extract_video(self, data, video_id=None): - video = data['video'] - description = str_or_none(try_get(data, lambda x: x['desc'])) - width = int_or_none(try_get(data, lambda x: video['width'])) - height = int_or_none(try_get(data, lambda x: video['height'])) + video = try_get(data, lambda x: x['video'], dict) + if not video: + return + width = int_or_none(video.get('width')) + height = int_or_none(video.get('height')) format_urls = set() formats = [] @@ -43,30 +44,32 @@ thumbnail = url_or_none(video.get('cover')) duration = float_or_none(video.get('duration')) - uploader = try_get(data, lambda x: x['author']['nickname'], compat_str) - uploader_id = try_get(data, lambda x: x['author']['id'], compat_str) + author = data.get('author') + if isinstance(author, dict): + uploader_id = author.get('id') + else: + uploader_id = data.get('authorId') + author = data + uploader = str_or_none(author.get('nickname')) timestamp = int_or_none(data.get('createTime')) - def stats(key): - return int_or_none(try_get( - data, lambda x: x['stats']['%sCount' % key])) - - view_count = stats('play') - like_count = stats('digg') - comment_count = stats('comment') - repost_count = stats('share') + stats = try_get(data, lambda x: x['stats'], dict) + view_count, like_count, comment_count, repost_count = [ + stats and int_or_none(stats.get('%sCount' % key)) + for key in ('play', 'digg', 'comment', 'share', )] aweme_id = data.get('id') or video_id return { 'id': aweme_id, + 'display_id': video_id, 'title': uploader or aweme_id, - 'description': description, + 'description': str_or_none(data.get('desc')), 'thumbnail': thumbnail, 'duration': duration, 'uploader': uploader, - 'uploader_id': uploader_id, + 'uploader_id': str_or_none(uploader_id), 'timestamp': timestamp, 'view_count': view_count, 'like_count': like_count, @@ -84,11 +87,11 @@ 'info_dict': { 'id': '6606727368545406213', 'ext': 'mp4', - 'title': 'Zureeal', + 'title': 'md5:24acc456b62b938a7e2dd88e978b20d9', 'description': '#bowsette#mario#cosplay#uk#lgbt#gaming#asian#bowsettecosplay', 'thumbnail': r're:^https?://.*', 'duration': 15, - 'uploader': 'Zureeal', + 'uploader': 'md5:24acc456b62b938a7e2dd88e978b20d9', 'uploader_id': '188294915489964032', 'timestamp': 1538248586, 'upload_date': '20180929', @@ -108,8 +111,17 @@ video_id = self._match_id(url) webpage = self._download_webpage(url, video_id) page_props = self._parse_json(self._search_regex( - r'<script[^>]+\bid=["\']__NEXT_DATA__[^>]+>\s*({.+?})\s*</script', - webpage, 'data'), video_id)['props']['pageProps'] + r'''(?s)<script\s[^>]*?\bid\s*=\s*(?P<q>"|'|\b)sigi-persisted-data(?P=q)[^>]*>[^=]*=\s*(?P<json>{.+?})\s*(?:;[^<]+)?</script''', + webpage, 'sigi data', default='{}', group='json'), video_id) + data = try_get(page_props, lambda x: x['ItemModule'][video_id]['video'], dict) + if data: + data = page_props['ItemModule'][video_id] + if data.get('privateItem'): + raise ExtractorError('This video is private', expected=True) + return self._extract_video(data, video_id) + page_props = self._parse_json(self._search_regex( + r'''(?s)<script\s[^>]*?\bid\s*=\s*(?P<q>"|'|\b)__NEXT_DATA__(?P=q)[^>]*>\s*(?P<json>{.+?})\s*</script''', + webpage, 'data', group='json'), video_id)['props']['pageProps'] data = try_get(page_props, lambda x: x['itemInfo']['itemStruct'], dict) if not data and page_props.get('statusCode') == 10216: raise ExtractorError('This video is private', expected=True) ``` Apparently some fetch attempts get a page that contains no media information: neither the `__NEXT_DATA__` nor the `sigi-persisted-data`. You just have to try again.
Author
Owner

@dirkf commented on GitHub (Dec 13, 2021):

A couple of tests support [report that all attempts were showing __NEXT_DATA__]. Perhaps TikTok was trying out the two different toolkits and has settled on Next.js. No-one is reporting issues with yt-dlp which uses __NEXT_DATA__, though that extractor downloads the page twice, commenting that videos may get 403 otherwise.

@dirkf commented on GitHub (Dec 13, 2021): A couple of tests support [<s>report that all attempts were showing `__NEXT_DATA__`</s>]. Perhaps TikTok was trying out the two different toolkits and has settled on Next.js. No-one is reporting issues with yt-dlp which uses `__NEXT_DATA__`, though that extractor downloads the page twice, commenting that videos may get 403 otherwise.
Author
Owner

@dirkf commented on GitHub (Dec 24, 2021):

See https://medium.com/@szdc/reverse-engineering-the-musical-ly-api-662331008eb3, for example.

[For younger readers like myself, musical.ly is what became tiktok]

@dirkf commented on GitHub (Dec 24, 2021): See https://medium.com/@szdc/reverse-engineering-the-musical-ly-api-662331008eb3, for example. [For younger readers like myself, musical.ly is what became tiktok]
Author
Owner

@someziggyman commented on GitHub (Jan 6, 2022):

Hey guys,

Sorry to bother you with this, but I just can't seem to make your patch work...
Had no issues with you YouTube throttling patch. Works great and really appreciate your work!
With this one, I'm just lacking knowledge or scripting skill.
I'm always getting.
[TikTok] Setting up session
ERROR: Unable to download webpage: [Errno 60] Operation timed out (caused by error(60, 'Operation timed out')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

@dirkf Is there a way to just share a working tiktok.py or update this PR here? https://github.com/ytdl-org/youtube-dl/pull/30224

My apologies for confusion and incompetence.

@someziggyman commented on GitHub (Jan 6, 2022): Hey guys, Sorry to bother you with this, but I just can't seem to make your patch work... Had no issues with you YouTube throttling patch. Works great and really appreciate your work! With this one, I'm just lacking knowledge or scripting skill. I'm always getting. _[TikTok] Setting up session ERROR: Unable to download webpage: [Errno 60] Operation timed out (caused by error(60, 'Operation timed out')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output._ @dirkf Is there a way to just share a working tiktok.py or update this PR here? https://github.com/ytdl-org/youtube-dl/pull/30224 My apologies for confusion and incompetence.
Author
Owner

@dirkf commented on GitHub (Jan 6, 2022):

That looks like a networking issue separate from any tiktok extractor problems.

We could do with a verbose format extraction log (-v -F) of the unmodified 2021.12.17 version (it shouldn't matter if you've updated extractor/youtube.py), and then the same with the extractor/tiktok.py from the PR. I think the PR should fix the same issue that my patch does, though there may be edge cases.

@dirkf commented on GitHub (Jan 6, 2022): That looks like a networking issue separate from any tiktok extractor problems. We could do with a verbose format extraction log (`-v -F`) of the unmodified 2021.12.17 version (it shouldn't matter if you've updated `extractor/youtube.py`), and then the same with the [`extractor/tiktok.py`](https://github.com/ytdl-org/youtube-dl/raw/b94f29355304d8ca95c046697ab630cf58eb2816/youtube_dl/extractor/tiktok.py) from the PR. I think the PR should fix the same issue that my patch does, though there may be edge cases.
Author
Owner

@someziggyman commented on GitHub (Jan 6, 2022):

Well, this is definitely not a network problem other websites work. PR does not include a single line of your patch and it does not work. here's the log:

./youtube-dl -v -F https://www.tiktok.com/@manoloteachesgolf/video/6987753546036923653
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'-F', u'https://www.tiktok.com/@manoloteachesgolf/video/6987753546036923653']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Python version 2.7.16 (CPython) - Darwin-20.6.0-arm64-arm-64bit
[debug] exe versions: none
[debug] Proxy map: {}
[TikTok] Setting up session
ERROR: Unable to download webpage: [Errno 60] Operation timed out (caused by error(60, 'Operation timed out')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
File "./youtube-dl/youtube_dl/extractor/common.py", line 634, in _request_webpage
return self._downloader.urlopen(url_or_request)
File "./youtube-dl/youtube_dl/YoutubeDL.py", line 2288, in urlopen
return self._opener.open(req, timeout=self._socket_timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 429, in open
response = self._open(req, data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 447, in _open
'_open', req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "./youtube-dl/youtube_dl/utils.py", line 1207, in https_open
req, **kwargs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1201, in do_open
r = h.getresponse(buffering=True)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1134, in getresponse
response.begin()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 442, in begin
version, status, reason = self._read_status()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 398, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 480, in readline
data = self._sock.recv(self._rbufsize)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ssl.py", line 754, in recv
return self.read(buflen)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ssl.py", line 641, in read
v = self._sslobj.read(len)

@someziggyman commented on GitHub (Jan 6, 2022): Well, this is definitely not a network problem other websites work. PR does not include a single line of your patch and it does not work. here's the log: ./youtube-dl -v -F https://www.tiktok.com/@manoloteachesgolf/video/6987753546036923653 [debug] System config: [] [debug] User config: [] [debug] Custom config: [] [debug] Command-line args: [u'-v', u'-F', u'https://www.tiktok.com/@manoloteachesgolf/video/6987753546036923653'] [debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8 [debug] youtube-dl version 2021.12.17 [debug] Python version 2.7.16 (CPython) - Darwin-20.6.0-arm64-arm-64bit [debug] exe versions: none [debug] Proxy map: {} [TikTok] Setting up session ERROR: Unable to download webpage: [Errno 60] Operation timed out (caused by error(60, 'Operation timed out')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. File "./youtube-dl/youtube_dl/extractor/common.py", line 634, in _request_webpage return self._downloader.urlopen(url_or_request) File "./youtube-dl/youtube_dl/YoutubeDL.py", line 2288, in urlopen return self._opener.open(req, timeout=self._socket_timeout) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 429, in open response = self._open(req, data) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 447, in _open '_open', req) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 407, in _call_chain result = func(*args) File "./youtube-dl/youtube_dl/utils.py", line 1207, in https_open req, **kwargs) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1201, in do_open r = h.getresponse(buffering=True) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1134, in getresponse response.begin() File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 442, in begin version, status, reason = self._read_status() File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 398, in _read_status line = self.fp.readline(_MAXLINE + 1) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 480, in readline data = self._sock.recv(self._rbufsize) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ssl.py", line 754, in recv return self.read(buflen) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ssl.py", line 641, in read v = self._sslobj.read(len)
Author
Owner

@dirkf commented on GitHub (Jan 6, 2022):

PR #30224 does include the necessary extra code for SIGI-type pages, but unfortunately, if understandably, the author didn't take up my suggestion to combine the additional code from my patch.

The timeout issue (Error 60 in Windows, SSLError('The read operation timed out',) in Linux), is some weird blocking done by whatever fronts TikTok's pages (Akamai, apparenty). In order to download the page for parsing, some cookie has to be sent and a way to get it is to make a previous request to the site. In yt-dl, and also with my patch and with PR #30224, the extractor fetches https://www.tiktok.com/ before doing anything else. In yt-dlp, which works with your URL using the latest GIT source, the code fetches the webpage itself twice, commenting that you get 403 otherwise.

A small change to the PR #30224 code improves on the yt-dlp approach: instead of fetching the whole page (GET request), just send a HEAD request; if a page is actually returned, rather than an error with a Set-Cookie header, it doesn't actually have to be downloaded.

--- old/youtube-dl/youtube_dl/extractor/tiktok.py
+++ new/youtube-dl/youtube_dl/extractor/tiktok.py
@@ -6,6 +6,7 @@
     compat_str,
     ExtractorError,
     float_or_none,
+    HEADRequest,
     int_or_none,
     str_or_none,
     try_get,
@@ -99,18 +100,27 @@
         }
     }]
 
-    def _real_initialize(self):
-        # Setup session (will set necessary cookies)
-        self._request_webpage(
-            'https://www.tiktok.com/', None, note='Setting up session')
-
     def _real_extract(self, url):
         video_id = self._match_id(url)
+
+        # dummy request to set cookies
+        self._request_webpage(
+            HEADRequest(url), video_id,
+            note=False, errnote='Could not send HEAD request to %s' % url,
+            fatal=False)
         webpage = self._download_webpage(url, video_id)
-        page_props = self._parse_json(self._search_regex(
-            r'<script[^>]+\bid=["\']__NEXT_DATA__[^>]+>\s*({.+?})\s*</script',
-            webpage, 'data'), video_id)['props']['pageProps']
-        data = try_get(page_props, lambda x: x['itemInfo']['itemStruct'], dict)
+        try:
+            page_props = self._parse_json(self._search_regex(
+                r'<script[^>]+\bid=["\']__NEXT_DATA__[^>]+>\s*({.+?})\s*</script',
+                webpage, 'data'), video_id)['props']['pageProps']
+            data = try_get(page_props, lambda x: x['itemInfo']['itemStruct'], dict)
+        except:
+            page_props = self._parse_json(self._search_regex(
+                r'<script[^>]+\bid=["\']sigi-persisted-data[^>]+>window\[\'SIGI_STATE\']=({.+?});window\[',
+                webpage, 'data'), video_id)
+            data = try_get(page_props, lambda x: x['ItemModule'][video_id], dict)
+            author = try_get(page_props, lambda x: x['UserModule']['users'][data['author']], dict)
+            data['author'] = author
         if not data and page_props.get('statusCode') == 10216:
             raise ExtractorError('This video is private', expected=True)
         return self._extract_video(data, video_id)

Then:

$ python -m youtube_dl -v -F 'https://www.tiktok.com/@manoloteachesgolf/video/6987753546036923653'
[debug] System config: [u'--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'-F', u'https://www.tiktok.com/@manoloteachesgolf/video/6987753546036923653']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.06.06
[debug] Git HEAD: bd7d796ef
[debug] Python version 2.7.17 (CPython) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[TikTok] 6987753546036923653: Downloading webpage
[info] Available formats for 6987753546036923653:
format code  extension  resolution note
0            mp4        576x1024   
df@Spiridion:~/Documents/src/youtube-dl$ python -m youtube_dl -v -F 'https://www.tiktok.com/@manoloteachesgolf/video/6987753546036923653'
[debug] System config: [u'--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'-F', u'https://www.tiktok.com/@manoloteachesgolf/video/6987753546036923653']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: bd7d796ef
[debug] Python version 2.7.17 (CPython) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[TikTok] 6987753546036923653: Downloading webpage
[info] Available formats for 6987753546036923653:
format code  extension  resolution note
0            mp4        576x1024   
$

I guess we need a new PR.

@dirkf commented on GitHub (Jan 6, 2022): PR #30224 does include the necessary extra code for SIGI-type pages, but unfortunately, if understandably, the author didn't take up my suggestion to combine the additional code from my patch. The timeout issue (`Error 60` in Windows, `SSLError('The read operation timed out',)` in Linux), is some weird blocking done by whatever fronts TikTok's pages (Akamai, apparenty). In order to download the page for parsing, some cookie has to be sent and a way to get it is to make a previous request to the site. In yt-dl, and also with my patch and with PR #30224, the extractor fetches `https://www.tiktok.com/` before doing anything else. In yt-dlp, which works with your URL using the latest GIT source, the code fetches the webpage itself twice, commenting that you get 403 otherwise. A small change to the PR #30224 code improves on the yt-dlp approach: instead of fetching the whole page (GET request), just send a HEAD request; if a page is actually returned, rather than an error with a `Set-Cookie` header, it doesn't actually have to be downloaded. ``` --- old/youtube-dl/youtube_dl/extractor/tiktok.py +++ new/youtube-dl/youtube_dl/extractor/tiktok.py @@ -6,6 +6,7 @@ compat_str, ExtractorError, float_or_none, + HEADRequest, int_or_none, str_or_none, try_get, @@ -99,18 +100,27 @@ } }] - def _real_initialize(self): - # Setup session (will set necessary cookies) - self._request_webpage( - 'https://www.tiktok.com/', None, note='Setting up session') - def _real_extract(self, url): video_id = self._match_id(url) + + # dummy request to set cookies + self._request_webpage( + HEADRequest(url), video_id, + note=False, errnote='Could not send HEAD request to %s' % url, + fatal=False) webpage = self._download_webpage(url, video_id) - page_props = self._parse_json(self._search_regex( - r'<script[^>]+\bid=["\']__NEXT_DATA__[^>]+>\s*({.+?})\s*</script', - webpage, 'data'), video_id)['props']['pageProps'] - data = try_get(page_props, lambda x: x['itemInfo']['itemStruct'], dict) + try: + page_props = self._parse_json(self._search_regex( + r'<script[^>]+\bid=["\']__NEXT_DATA__[^>]+>\s*({.+?})\s*</script', + webpage, 'data'), video_id)['props']['pageProps'] + data = try_get(page_props, lambda x: x['itemInfo']['itemStruct'], dict) + except: + page_props = self._parse_json(self._search_regex( + r'<script[^>]+\bid=["\']sigi-persisted-data[^>]+>window\[\'SIGI_STATE\']=({.+?});window\[', + webpage, 'data'), video_id) + data = try_get(page_props, lambda x: x['ItemModule'][video_id], dict) + author = try_get(page_props, lambda x: x['UserModule']['users'][data['author']], dict) + data['author'] = author if not data and page_props.get('statusCode') == 10216: raise ExtractorError('This video is private', expected=True) return self._extract_video(data, video_id) ``` Then: ``` $ python -m youtube_dl -v -F 'https://www.tiktok.com/@manoloteachesgolf/video/6987753546036923653' [debug] System config: [u'--prefer-ffmpeg'] [debug] User config: [] [debug] Custom config: [] [debug] Command-line args: [u'-v', u'-F', u'https://www.tiktok.com/@manoloteachesgolf/video/6987753546036923653'] [debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8 [debug] youtube-dl version 2021.06.06 [debug] Git HEAD: bd7d796ef [debug] Python version 2.7.17 (CPython) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial [debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3 [debug] Proxy map: {} [TikTok] 6987753546036923653: Downloading webpage [info] Available formats for 6987753546036923653: format code extension resolution note 0 mp4 576x1024 df@Spiridion:~/Documents/src/youtube-dl$ python -m youtube_dl -v -F 'https://www.tiktok.com/@manoloteachesgolf/video/6987753546036923653' [debug] System config: [u'--prefer-ffmpeg'] [debug] User config: [] [debug] Custom config: [] [debug] Command-line args: [u'-v', u'-F', u'https://www.tiktok.com/@manoloteachesgolf/video/6987753546036923653'] [debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8 [debug] youtube-dl version 2021.12.17 [debug] Git HEAD: bd7d796ef [debug] Python version 2.7.17 (CPython) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial [debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3 [debug] Proxy map: {} [TikTok] 6987753546036923653: Downloading webpage [info] Available formats for 6987753546036923653: format code extension resolution note 0 mp4 576x1024 $ ``` I guess we need a new PR.
Author
Owner

@someziggyman commented on GitHub (Jan 6, 2022):

WOW! thank you so much for such a detailed reply! Implemented this patch and it works for me!
appreciate your help and patience!

@someziggyman commented on GitHub (Jan 6, 2022): WOW! thank you so much for such a detailed reply! Implemented this patch and it works for me! appreciate your help and patience!
Author
Owner

@dirkf commented on GitHub (Jan 7, 2022):

PR #30479 linked above should address all current issues, with this version of the extractor. Please report any issues there.

@dirkf commented on GitHub (Jan 7, 2022): PR #30479 linked above should address all current issues, with this [version of the extractor](https://github.com/ytdl-org/youtube-dl/raw/b428754fa92ee50a2cef2ddbab019958026f88db/youtube_dl/extractor/tiktok.py). Please report any issues there.
Author
Owner

@hessijames79 commented on GitHub (Jan 7, 2022):

Looks good so far and just downloaded a small batch of videos that didn't work before.

@hessijames79 commented on GitHub (Jan 7, 2022): Looks good so far and just downloaded a small batch of videos that didn't work before.
Author
Owner

@Menard01 commented on GitHub (Jan 26, 2022):

To download on tiktok you only have to right click on the video and chose "Download the video"

@Menard01 commented on GitHub (Jan 26, 2022): To download on tiktok you only have to right click on the video and chose "Download the video"
Author
Owner

@dirkf commented on GitHub (Jan 31, 2022):

Apparently we now have to pretend to be a mobile phone (Windows: '' -> ""):
--user-agent 'Mozilla/5.0 (iPhone; CPU iPhone OS 10_3 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) CriOS/56.0.2924.75 Mobile/14E5239e Safari/602.1'

@dirkf commented on GitHub (Jan 31, 2022): Apparently we now have to pretend to be a mobile phone (Windows: '' -> ""): `--user-agent 'Mozilla/5.0 (iPhone; CPU iPhone OS 10_3 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) CriOS/56.0.2924.75 Mobile/14E5239e Safari/602.1'`
Author
Owner

@unitof commented on GitHub (Jan 30, 2023):

Passing --user-agent 'Mozilla/5.0 (iPhone; CPU iPhone OS 10_3 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) CriOS/56.0.2924.75 Mobile/14E5239e Safari/602.1' as suggested by @dirkf got me further, but shortly encountered a different error:

[TikTok] Setting up session
[TikTok] 7193925570105806126: Downloading webpage
ERROR: Unable to extract data; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
--verbose
youtube-dl https://www.tiktok.com/@special_head/video/7193925570105806126/ --user-agent 'Mozilla/5.0 (iPhone; CPU iPhone OS 10_3 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) CriOS/56.0.2924.75 Mobile/14E5239e Safari/602.1' --verbose
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['https://www.tiktok.com/@special_head/video/7193925570105806126/', '--user-agent', 'Mozilla/5.0 (iPhone; CPU iPhone OS 10_3 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) CriOS/56.0.2924.75 Mobile/14E5239e Safari/602.1', '--verbose']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: 4b3d64d30
[debug] Python version 3.11.1 (CPython) - macOS-13.2-arm64-arm-64bit
[debug] exe versions: ffmpeg 5.1.2, ffprobe 5.1.2, rtmpdump 2.4
[debug] Proxy map: {}
[TikTok] Setting up session
[TikTok] 7193925570105806126: Downloading webpage
ERROR: Unable to extract data; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/youtube-dl/2021.12.17/libexec/lib/python3.11/site-packages/youtube_dl/YoutubeDL.py", line 815, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/youtube-dl/2021.12.17/libexec/lib/python3.11/site-packages/youtube_dl/YoutubeDL.py", line 836, in __extract_info
    ie_result = ie.extract(url)
                ^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/youtube-dl/2021.12.17/libexec/lib/python3.11/site-packages/youtube_dl/extractor/common.py", line 534, in extract
    ie_result = self._real_extract(url)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/youtube-dl/2021.12.17/libexec/lib/python3.11/site-packages/youtube_dl/extractor/tiktok.py", line 110, in _real_extract
    page_props = self._parse_json(self._search_regex(
                                  ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/youtube-dl/2021.12.17/libexec/lib/python3.11/site-packages/youtube_dl/extractor/common.py", line 1012, in _search_regex
    raise RegexNotFoundError('Unable to extract %s' % _name)
youtube_dl.utils.RegexNotFoundError: Unable to extract data; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

If helpful, yt-dlp seems to have some solution worked out, no additional flags needed.

@unitof commented on GitHub (Jan 30, 2023): Passing `--user-agent 'Mozilla/5.0 (iPhone; CPU iPhone OS 10_3 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) CriOS/56.0.2924.75 Mobile/14E5239e Safari/602.1'` as suggested by @dirkf got me further, but shortly encountered a different error: ``` [TikTok] Setting up session [TikTok] 7193925570105806126: Downloading webpage ERROR: Unable to extract data; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. ``` <details> <summary><code>--verbose</code></summary> <pre> youtube-dl https://www.tiktok.com/@special_head/video/7193925570105806126/ --user-agent 'Mozilla/5.0 (iPhone; CPU iPhone OS 10_3 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) CriOS/56.0.2924.75 Mobile/14E5239e Safari/602.1' --verbose [debug] System config: [] [debug] User config: [] [debug] Custom config: [] [debug] Command-line args: ['https://www.tiktok.com/@special_head/video/7193925570105806126/', '--user-agent', 'Mozilla/5.0 (iPhone; CPU iPhone OS 10_3 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) CriOS/56.0.2924.75 Mobile/14E5239e Safari/602.1', '--verbose'] [debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8 [debug] youtube-dl version 2021.12.17 [debug] Git HEAD: 4b3d64d30 [debug] Python version 3.11.1 (CPython) - macOS-13.2-arm64-arm-64bit [debug] exe versions: ffmpeg 5.1.2, ffprobe 5.1.2, rtmpdump 2.4 [debug] Proxy map: {} [TikTok] Setting up session [TikTok] 7193925570105806126: Downloading webpage ERROR: Unable to extract data; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. Traceback (most recent call last): File "/opt/homebrew/Cellar/youtube-dl/2021.12.17/libexec/lib/python3.11/site-packages/youtube_dl/YoutubeDL.py", line 815, in wrapper return func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/youtube-dl/2021.12.17/libexec/lib/python3.11/site-packages/youtube_dl/YoutubeDL.py", line 836, in __extract_info ie_result = ie.extract(url) ^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/youtube-dl/2021.12.17/libexec/lib/python3.11/site-packages/youtube_dl/extractor/common.py", line 534, in extract ie_result = self._real_extract(url) ^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/youtube-dl/2021.12.17/libexec/lib/python3.11/site-packages/youtube_dl/extractor/tiktok.py", line 110, in _real_extract page_props = self._parse_json(self._search_regex( ^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/youtube-dl/2021.12.17/libexec/lib/python3.11/site-packages/youtube_dl/extractor/common.py", line 1012, in _search_regex raise RegexNotFoundError('Unable to extract %s' % _name) youtube_dl.utils.RegexNotFoundError: Unable to extract data; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. </pre> </details> If helpful, `yt-dlp` seems to have some solution worked out, no additional flags needed.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/youtube-dl-ytdl-org#24697
No description provided.