[drtv] subtitle empty #24998

Open
opened 2026-02-21 12:18:35 -05:00 by deekerman · 4 comments
Owner

Originally created by @dpriskorn on GitHub (Feb 20, 2022).

Checklist

  • [x ] I'm reporting a broken site support
  • [ x] I've verified that I'm running youtube-dl version 2021.12.17
  • [ x] I've checked that all provided URLs are alive and playable in a browser
  • [ x] I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • [ x] I've searched the bugtracker for similar issues including closed ones

Verbose log

$ youtube-dl -f HLS-560 https://www.dr.dk/drtv/se/spionkrigen-i-ringsted_-agenten_297110 --write-sub -v
[debug] System config: []
[debug] User config: ['--restrict-filenames']
[debug] Custom config: []
[debug] Command-line args: ['--prefer-free-formats', '-t', '-f', 'HLS-560', 'https://www.dr.dk/drtv/se/spionkrigen-i-ringsted_-agenten_297110', '--write-sub', '-v']
WARNING: --title is deprecated. Use -o "%(title)s-%(id)s.%(ext)s" instead.
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Python version 3.10.2 (CPython) - Linux-5.10.89-gnu1-1-lts-x86_64-with-glibc2.35
[debug] exe versions: ffmpeg present, ffprobe present, rtmpdump 2.4
[debug] Proxy map: {}
[drtv] spionkrigen-i-ringsted_-agenten_297110: Downloading webpage
[drtv] 00242105010: Downloading video JSON
[drtv] 00242105010: Downloading m3u8 information
[drtv] 00242105010: Downloading m3u8 information
[drtv] 00242105010: Downloading m3u8 information
[info] Writing video subtitles to: Spionkrigen_i_Ringsted_1_4_-_Agenten-00242105010.da.vtt
[debug] Invoking downloader on 'https://drod09h-vh.akamaihd.net/i/all/clear/streaming/5d/61f40be9aa5a612b344e0c5d/Spionkrigen-i-Ringsted_b8e4eadf521344929e67691987d35f10_,500,1100,2000,3500,5500,.mp4.csmil/index_0_av.m3u8?null=0'
[download] Spionkrigen_i_Ringsted_1_4_-_Agenten-00242105010.mp4 has already been downloaded
[download] 100% of 126.82MiB
[debug] ffmpeg command line: ffprobe -show_streams file:Spionkrigen_i_Ringsted_1_4_-_Agenten-00242105010.mp4

Description

The downloaded subtitle is practically empty.

$ cat Spionkrigen_i_Ringsted_1_4_-_Agenten-00242105010.da.vtt 
WEBVTT

The official player has subtitles that can be enabled (the video contains a lot of arabic)

Originally created by @dpriskorn on GitHub (Feb 20, 2022). ## Checklist <!-- Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.12.17. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape. - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates. - Finally, put x into all relevant boxes (like this [x]) --> - [x ] I'm reporting a broken site support - [ x] I've verified that I'm running youtube-dl version **2021.12.17** - [ x] I've checked that all provided URLs are alive and playable in a browser - [ x] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [ x] I've searched the bugtracker for similar issues including closed ones ## Verbose log <!-- Provide the complete verbose output of youtube-dl that clearly demonstrates the problem. Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <your command line>`), copy the WHOLE output and insert it below. It should look similar to this: [debug] System config: [] [debug] User config: [] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] youtube-dl version 2021.12.17 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] Proxy map: {} <more lines> --> ``` $ youtube-dl -f HLS-560 https://www.dr.dk/drtv/se/spionkrigen-i-ringsted_-agenten_297110 --write-sub -v [debug] System config: [] [debug] User config: ['--restrict-filenames'] [debug] Custom config: [] [debug] Command-line args: ['--prefer-free-formats', '-t', '-f', 'HLS-560', 'https://www.dr.dk/drtv/se/spionkrigen-i-ringsted_-agenten_297110', '--write-sub', '-v'] WARNING: --title is deprecated. Use -o "%(title)s-%(id)s.%(ext)s" instead. [debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8 [debug] youtube-dl version 2021.12.17 [debug] Python version 3.10.2 (CPython) - Linux-5.10.89-gnu1-1-lts-x86_64-with-glibc2.35 [debug] exe versions: ffmpeg present, ffprobe present, rtmpdump 2.4 [debug] Proxy map: {} [drtv] spionkrigen-i-ringsted_-agenten_297110: Downloading webpage [drtv] 00242105010: Downloading video JSON [drtv] 00242105010: Downloading m3u8 information [drtv] 00242105010: Downloading m3u8 information [drtv] 00242105010: Downloading m3u8 information [info] Writing video subtitles to: Spionkrigen_i_Ringsted_1_4_-_Agenten-00242105010.da.vtt [debug] Invoking downloader on 'https://drod09h-vh.akamaihd.net/i/all/clear/streaming/5d/61f40be9aa5a612b344e0c5d/Spionkrigen-i-Ringsted_b8e4eadf521344929e67691987d35f10_,500,1100,2000,3500,5500,.mp4.csmil/index_0_av.m3u8?null=0' [download] Spionkrigen_i_Ringsted_1_4_-_Agenten-00242105010.mp4 has already been downloaded [download] 100% of 126.82MiB [debug] ffmpeg command line: ffprobe -show_streams file:Spionkrigen_i_Ringsted_1_4_-_Agenten-00242105010.mp4 ``` ## Description <!-- Provide an explanation of your issue in an arbitrary form. Provide any additional information, suggested solution and as much context and examples as possible. If work on your issue requires account credentials please provide them or explain how one can obtain them. --> The downloaded subtitle is practically empty. ``` $ cat Spionkrigen_i_Ringsted_1_4_-_Agenten-00242105010.da.vtt WEBVTT ``` The official player has subtitles that can be enabled (the video contains a lot of arabic)
Author
Owner

@dirkf commented on GitHub (Feb 20, 2022):

I can reproduce this. It looks like the same issue would be seen with yt-dlp too, so no easy fix there.

There are pending fixes for DRTV, but I don't think subtitles are affected, so some debugging is needed.

@dirkf commented on GitHub (Feb 20, 2022): I can reproduce this. It looks like the same issue would be seen with yt-dlp too, so no easy fix there. There are pending fixes for DRTV, but I don't think subtitles are affected, so some debugging is needed.
Author
Owner

@dpriskorn commented on GitHub (Feb 21, 2022):

I debugged a little.
image
This request for the master.m3u8 has the link to the vtt playlist with all the segments

curl 'https://drod09h-vh.akamaihd.net/i/all/clear/streaming/5d/61f40be9aa5a612b344e0c5d/Spionkrigen-i-Ringsted_b8e4eadf521344929e67691987d35f10_,500,1100,2000,3500,5500,.mp4.csmil/master.m3u8?cc1=name=Fremmedsprogstekster~default=yes~forced=no~lang=da~uri=https://drod09h-vh.akamaihd.net/p/allx/clear/download/5d/61f40be9aa5a612b344e0c5d/subtitles/Foreign-19324737-6e40ee8a-1569-4676-bdc0-3396bd943dd1/playlist.m3u8&cc2=name=Dansk~default=no~forced=no~lang=da~uri=https://drod09h-vh.akamaihd.net/p/allx/clear/download/5d/61f40be9aa5a612b344e0c5d/subtitles/Foreign_HardOfHearing-19324737-6e40ee8a-1569-4676-bdc0-3396bd943dd1/playlist.m3u8' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0' -H 'Accept: */*' -H 'Accept-Language: en-US,en;q=0.5' -H 'Accept-Encoding: gzip, deflate, br' -H 'Origin: https://www.dr.dk' -H 'Connection: keep-alive' -H 'Referer: https://www.dr.dk/' -H 'Sec-Fetch-Dest: empty' -H 'Sec-Fetch-Mode: cors' -H 'Sec-Fetch-Site: cross-site'

The link is in a comment https://drod09h-vh.akamaihd.net/p/allx/clear/download/5d/61f40be9aa5a612b344e0c5d/subtitles/Foreign-19324737-6e40ee8a-1569-4676-bdc0-3396bd943dd1/playlist.m3u8
->
image
->
e.g. segment 6 here has subtitles

@dpriskorn commented on GitHub (Feb 21, 2022): I debugged a little. ![image](https://user-images.githubusercontent.com/68460690/154927210-a8b1405f-eb48-49ad-afab-ba1759f42f22.png) This request for the master.m3u8 has the link to the vtt playlist with all the segments ``` curl 'https://drod09h-vh.akamaihd.net/i/all/clear/streaming/5d/61f40be9aa5a612b344e0c5d/Spionkrigen-i-Ringsted_b8e4eadf521344929e67691987d35f10_,500,1100,2000,3500,5500,.mp4.csmil/master.m3u8?cc1=name=Fremmedsprogstekster~default=yes~forced=no~lang=da~uri=https://drod09h-vh.akamaihd.net/p/allx/clear/download/5d/61f40be9aa5a612b344e0c5d/subtitles/Foreign-19324737-6e40ee8a-1569-4676-bdc0-3396bd943dd1/playlist.m3u8&cc2=name=Dansk~default=no~forced=no~lang=da~uri=https://drod09h-vh.akamaihd.net/p/allx/clear/download/5d/61f40be9aa5a612b344e0c5d/subtitles/Foreign_HardOfHearing-19324737-6e40ee8a-1569-4676-bdc0-3396bd943dd1/playlist.m3u8' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0' -H 'Accept: */*' -H 'Accept-Language: en-US,en;q=0.5' -H 'Accept-Encoding: gzip, deflate, br' -H 'Origin: https://www.dr.dk' -H 'Connection: keep-alive' -H 'Referer: https://www.dr.dk/' -H 'Sec-Fetch-Dest: empty' -H 'Sec-Fetch-Mode: cors' -H 'Sec-Fetch-Site: cross-site' ``` The link is in a comment https://drod09h-vh.akamaihd.net/p/allx/clear/download/5d/61f40be9aa5a612b344e0c5d/subtitles/Foreign-19324737-6e40ee8a-1569-4676-bdc0-3396bd943dd1/playlist.m3u8 -> ![image](https://user-images.githubusercontent.com/68460690/154927552-4df3c367-ba64-417b-9672-6a765cfc3288.png) -> e.g. [segment 6 here](https://drod09k-vh.akamaihd.net/p/allx/clear/download/5d/61f40be9aa5a612b344e0c5d/subtitles/Foreign-19324737-6e40ee8a-1569-4676-bdc0-3396bd943dd1/segment_6.vtt) has subtitles
Author
Owner

@dirkf commented on GitHub (Feb 21, 2022):

So --list-subs gives

Available subtitles for 00242105010:
Language formats
da       vtt, vtt, vtt, vtt, vtt

while --all-subs just downloads the single dud .vtt.

The extractor looks up the show in https://www.dr.dk/mu-online/api/1.4/programcard. There are three assets in the returned programme metadata, each with two subtitle URLs, except the third which only has one.

[[
  {
    "MimeType": "text/vtt;charset=utf-8",
    "Type": "Foreign",
    "Uri": "https://drod09h-vh.akamaihd.net/p/allx/clear/download/5d/61f40be9aa5a612b344e0c5d/subtitles/Foreign-19324737-6e40ee8a-1569-4676-bdc0-3396bd943dd1.vtt",
    "Language": "Danish"
  },
  {
    "MimeType": "text/vtt;charset=utf-8",
    "Type": "Foreign_HardOfHearing",
    "Uri": "https://drod09h-vh.akamaihd.net/p/allx/clear/download/5d/61f40be9aa5a612b344e0c5d/subtitles/Foreign_HardOfHearing-19324737-6e40ee8a-1569-4676-bdc0-3396bd943dd1.vtt",
    "Language": "Danish"
  }
],
[{
  "MimeType": "text/vtt;charset=utf-8",
  "Type": "Foreign",
  "Uri": "https://drod04f-vh.akamaihd.net/p/allx/clear/download/89/61f40c63af5a612af86c7e89/subtitles/Foreign-19324737-6e40ee8a-1569-4676-bdc0-3396bd943dd1.vtt",
  "Language": "Danish"
},
{
  "MimeType": "text/vtt;charset=utf-8",
  "Type": "Foreign_HardOfHearing",
  "Uri": "https://drod04f-vh.akamaihd.net/p/allx/clear/download/89/61f40c63af5a612af86c7e89/subtitles/Foreign_HardOfHearing-19324737-6e40ee8a-1569-4676-bdc0-3396bd943dd1.vtt",
  "Language": "Danish"
}
],
[{
  "MimeType": "text/vtt;charset=utf-8",
  "Type": "Foreign_HardOfHearing",
  "Uri": "https://drod01e-vh.akamaihd.net/p/allx/clear/download/9e/61fbf808a95a612450c32a9e/subtitles/Foreign_HardOfHearing-19324737-09b133ba-24c7-4b05-a548-a12d9396d3f0.vtt",
  "Language": "Danish"
}
]]

This becomes clearer when we look at each asset:

(Pdb) p assets[0]
{u'Kind': u'VideoResource', u'Target': u'Default', ...
(Pdb) p assets[1]
{u'Kind': u'VideoResource', u'Target': u'SpokenSubtitles', ...
(Pdb) p assets[2]
{u'Kind': u'VideoResource', u'Target': u'SignLanguage', ...

The fifth URL in the list above is selected as the best if no --sub-format option was specified, as the subtitle list is supposed to be sorted from least to best preference, and that is the dud subtitle file for the signed version.

There are several problems here:

  • the extractor demotes video formats for Targets that are SpokenSubtitles, SignLanguage, or VisuallyInterpreted, but doesn't distinguish the corresponding subtitles;
  • yt-dl has no preference attribute for subtitles: the language tag and subtitle type (vtt/srt/...) are the only available discriminators;
  • the --all-subs option doesn't actually download every subtitle listed by --list-subs (even though the help says it should 'Download all the available subtitles of the video'), but just the preferred one for each language.

Looking at the available subtitles:

  1. Danish Foreign: presumably the non-Danish speech is rendered;
  2. Danish ForeignHardOfHearing: presumably all speech is rendered?
  3. Danish Foreign SpokenSubtitles: presumably the non-Danish speech is rendered (actually same as 1);
  4. Danish ForeignHardOfHearing SpokenSubtitles: presumably all speech is rendered? (probably same as 2)
  5. Danish Foreign SignLanguage: it's not obvious why there shouldn't be valid subtitles available so that non-DSL 'speakers' can also watch and get the foreign speech translated.

It's also not obvious why there shouldn't be Danish ForeignHardOfHearing SignLanguage subtitles to allow DSL and non-DSL speakers to watch together.

The extractor could assign a different language code for subtitles extracted from SignLanguage and VisuallyInterpreted Targets, such as sgn-dsl.

There isn't an official language code that means language X translations from other languages plus original language X. Maybe we could invent dan-da for this.

--- old/youtube-dl/youtube_dl/extractor/drtv.py
+++ new/youtube-dl/youtube_dl/extractor/drtv.py
@@ -15,6 +15,7 @@
     int_or_none,
     intlist_to_bytes,
     float_or_none,
+    ISO639Utils,
     mimetype2ext,
     str_or_none,
     try_get,
@@ -268,7 +269,12 @@
                     if not sub_uri:
                         continue
                     lang = subs.get('Language') or 'da'
-                    subtitles.setdefault(LANGS.get(lang, lang), []).append({
+                    lang = LANGS.get(lang, lang)
+                    if asset_target in ('SignLanguage', 'VisuallyInterpreted'):
+                        lang = 'sgn' + ('-dsl' if lang == 'da' else '')
+                    elif 'HardOfHearing' in subs.get('Type', ''):
+                        lang = '-'.join((ISO639Utils.short2long(lang), lang))
+                    subtitles.setdefault(lang, []).append({
                         'url': sub_uri,
                         'ext': mimetype2ext(subs.get('MimeType')) or 'vtt'
                     })

Then with this list output, the Danish Foreign subtitle file can be selected with --sub-lang da:

Available subtitles for 00242105010:
Language formats
sgn-dsl  vtt
dan-da   vtt, vtt
da       vtt, vtt

Although yt-dlp has a small change in subtitle processing, this issue would also apply there.

@dirkf commented on GitHub (Feb 21, 2022): So `--list-subs` gives ``` Available subtitles for 00242105010: Language formats da vtt, vtt, vtt, vtt, vtt ``` while `--all-subs` just downloads the single dud .vtt. The extractor looks up the show in https://www.dr.dk/mu-online/api/1.4/programcard. There are three `asset`s in the returned programme metadata, each with two subtitle URLs, except the third which only has one. ```json [[ { "MimeType": "text/vtt;charset=utf-8", "Type": "Foreign", "Uri": "https://drod09h-vh.akamaihd.net/p/allx/clear/download/5d/61f40be9aa5a612b344e0c5d/subtitles/Foreign-19324737-6e40ee8a-1569-4676-bdc0-3396bd943dd1.vtt", "Language": "Danish" }, { "MimeType": "text/vtt;charset=utf-8", "Type": "Foreign_HardOfHearing", "Uri": "https://drod09h-vh.akamaihd.net/p/allx/clear/download/5d/61f40be9aa5a612b344e0c5d/subtitles/Foreign_HardOfHearing-19324737-6e40ee8a-1569-4676-bdc0-3396bd943dd1.vtt", "Language": "Danish" } ], [{ "MimeType": "text/vtt;charset=utf-8", "Type": "Foreign", "Uri": "https://drod04f-vh.akamaihd.net/p/allx/clear/download/89/61f40c63af5a612af86c7e89/subtitles/Foreign-19324737-6e40ee8a-1569-4676-bdc0-3396bd943dd1.vtt", "Language": "Danish" }, { "MimeType": "text/vtt;charset=utf-8", "Type": "Foreign_HardOfHearing", "Uri": "https://drod04f-vh.akamaihd.net/p/allx/clear/download/89/61f40c63af5a612af86c7e89/subtitles/Foreign_HardOfHearing-19324737-6e40ee8a-1569-4676-bdc0-3396bd943dd1.vtt", "Language": "Danish" } ], [{ "MimeType": "text/vtt;charset=utf-8", "Type": "Foreign_HardOfHearing", "Uri": "https://drod01e-vh.akamaihd.net/p/allx/clear/download/9e/61fbf808a95a612450c32a9e/subtitles/Foreign_HardOfHearing-19324737-09b133ba-24c7-4b05-a548-a12d9396d3f0.vtt", "Language": "Danish" } ]] ``` This becomes clearer when we look at each `asset`: ``` (Pdb) p assets[0] {u'Kind': u'VideoResource', u'Target': u'Default', ... (Pdb) p assets[1] {u'Kind': u'VideoResource', u'Target': u'SpokenSubtitles', ... (Pdb) p assets[2] {u'Kind': u'VideoResource', u'Target': u'SignLanguage', ... ``` The fifth URL in the list above is selected as the `best` if no `--sub-format` option was specified, as the subtitle list is supposed to be sorted from least to best preference, and that is the dud subtitle file for the signed version. There are several problems here: * the extractor demotes video formats for `Target`s that are `SpokenSubtitles`, `SignLanguage`, or `VisuallyInterpreted`, but doesn't distinguish the corresponding subtitles; * yt-dl has no preference attribute for subtitles: the language tag and subtitle type (vtt/srt/...) are the only available discriminators; * the `--all-subs` option doesn't actually download every subtitle listed by `--list-subs` (even though the help says it should 'Download all the available subtitles of the video'), but just the preferred one for each language. Looking at the available subtitles: 1. Danish Foreign: presumably the non-Danish speech is rendered; 2. Danish ForeignHardOfHearing: presumably all speech is rendered? 1. Danish Foreign SpokenSubtitles: presumably the non-Danish speech is rendered (actually same as 1); 2. Danish ForeignHardOfHearing SpokenSubtitles: presumably all speech is rendered? (probably same as 2) 1. Danish Foreign SignLanguage: it's not obvious why there shouldn't be valid subtitles available so that non-DSL 'speakers' can also watch and get the foreign speech translated. It's also not obvious why there shouldn't be Danish ForeignHardOfHearing SignLanguage subtitles to allow DSL and non-DSL speakers to watch together. The extractor could assign a different language code for subtitles extracted from `SignLanguage` and `VisuallyInterpreted` `Target`s, such as `sgn-dsl`. There isn't an official language code that means language X translations from other languages plus original language X. Maybe we could invent `dan-da` for this. ```py --- old/youtube-dl/youtube_dl/extractor/drtv.py +++ new/youtube-dl/youtube_dl/extractor/drtv.py @@ -15,6 +15,7 @@ int_or_none, intlist_to_bytes, float_or_none, + ISO639Utils, mimetype2ext, str_or_none, try_get, @@ -268,7 +269,12 @@ if not sub_uri: continue lang = subs.get('Language') or 'da' - subtitles.setdefault(LANGS.get(lang, lang), []).append({ + lang = LANGS.get(lang, lang) + if asset_target in ('SignLanguage', 'VisuallyInterpreted'): + lang = 'sgn' + ('-dsl' if lang == 'da' else '') + elif 'HardOfHearing' in subs.get('Type', ''): + lang = '-'.join((ISO639Utils.short2long(lang), lang)) + subtitles.setdefault(lang, []).append({ 'url': sub_uri, 'ext': mimetype2ext(subs.get('MimeType')) or 'vtt' }) ``` Then with this list output, the Danish Foreign subtitle file can be selected with `--sub-lang da`: ``` Available subtitles for 00242105010: Language formats sgn-dsl vtt dan-da vtt, vtt da vtt, vtt ``` Although yt-dlp has a small change in subtitle processing, this issue would also apply there.
Author
Owner

@ghault commented on GitHub (Feb 9, 2025):

Got around this with --all-subs myself. Problem with DRTV is many original and historically important documentaries expire and are taken down after a set number of years, so you can't come back later and pull the subs, making it a more frustrating issue, not just for personal but also archival purposes.

@ghault commented on GitHub (Feb 9, 2025): Got around this with `--all-subs` myself. Problem with DRTV is many original and historically important documentaries expire and are taken down after a set number of years, so you can't come back later and pull the subs, making it a more frustrating issue, not just for personal but also archival purposes.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/youtube-dl#24998
No description provided.