[Request] convert npo.nl webvtt-subtitles to srt #4022

Closed
opened 2026-02-21 01:46:52 -05:00 by deekerman · 7 comments
Owner

Originally created by @Reino17 on GitHub (Feb 15, 2015).

Originally assigned to: @jaimeMF on GitHub.

Frenzie and dstftw, thanks for the initial support, but npo.nl subtitles are of the webvtt format, which a lot of players don't support.
For yesterday's 8 o' clock newsbulletin for example, youtube-dl reports:

youtube-dl.exe -s --list-subs http://www.npo.nl/nos-journaal/14-02-2015/POW_00942207
[npo.nl] POW_00942206: Downloading JSON metadata
......
[npo.nl] POW_00942206: Downloading h264_std stream JSON
WARNING: Automatic Captions not supported by this server
[npo.nl] POW_00942206: Available subtitles for video: nl
[npo.nl] POW_00942206: Available automatic captions for video:

When I then download these subtitles:...

youtube-dl.exe --skip-download --write-sub --sub-format srt --sub-lang nl http://www.npo.nl/
nos-journaal/14-02-2015/POW_00942207
[npo.nl] POW_00942207: Downloading JSON metadata
......
[npo.nl] POW_00942207: Downloading h264_std stream JSON
[info] Writing video subtitles to: NOS Journaal-POW_00942207.nl.srt

..., the first couple of lines look like this:

WEBVTT

1
00:00:03.000 --> 00:00:05.019
888

2
00:00:05.019 --> 00:00:12.005
Een schietpartij op een bijeenkomst in Kopenhagen met een omstreden Zweedse cartoonist.
  • First of all: If youtube-dl has the ability to detect the subtitle-format, could you perhaps show that in the process, like: [npo.nl] POW_00942206: Available subtitles for video: nl (vtt).
  • Secondly: Whether or not the subtitle-format is detected, --sub-format srt didn't do anything.
    In this case, with npo-subs, you'd only have to remove the first 2 lines to have fully working srt-subtitles.
    I was hoping that if youtube-dl detects vtt-subtitles, --sub-format srt would actually convert these to srt.
Originally created by @Reino17 on GitHub (Feb 15, 2015). Originally assigned to: @jaimeMF on GitHub. Frenzie and dstftw, thanks for the [initial support](https://github.com/rg3/youtube-dl/issues/3638), but npo.nl subtitles are of the webvtt format, which a lot of players don't support. For yesterday's 8 o' clock newsbulletin for example, youtube-dl reports: ``` youtube-dl.exe -s --list-subs http://www.npo.nl/nos-journaal/14-02-2015/POW_00942207 [npo.nl] POW_00942206: Downloading JSON metadata ...... [npo.nl] POW_00942206: Downloading h264_std stream JSON WARNING: Automatic Captions not supported by this server [npo.nl] POW_00942206: Available subtitles for video: nl [npo.nl] POW_00942206: Available automatic captions for video: ``` When I then download these subtitles:... ``` youtube-dl.exe --skip-download --write-sub --sub-format srt --sub-lang nl http://www.npo.nl/ nos-journaal/14-02-2015/POW_00942207 [npo.nl] POW_00942207: Downloading JSON metadata ...... [npo.nl] POW_00942207: Downloading h264_std stream JSON [info] Writing video subtitles to: NOS Journaal-POW_00942207.nl.srt ``` ..., the first couple of lines look like this: ``` WEBVTT 1 00:00:03.000 --> 00:00:05.019 888 2 00:00:05.019 --> 00:00:12.005 Een schietpartij op een bijeenkomst in Kopenhagen met een omstreden Zweedse cartoonist. ``` - First of all: If youtube-dl has the ability to detect the subtitle-format, could you perhaps show that in the process, like: `[npo.nl] POW_00942206: Available subtitles for video: nl (vtt)`. - Secondly: Whether or not the subtitle-format is detected, `--sub-format srt` didn't do anything. In this case, with npo-subs, you'd only have to remove the first 2 lines to have fully working srt-subtitles. I was hoping that if youtube-dl detects vtt-subtitles, `--sub-format srt` would actually convert these to srt.
deekerman 2026-02-21 01:46:52 -05:00
  • closed this issue
  • added the
    subtitles
    label
Author
Owner

@jaimeMF commented on GitHub (Feb 15, 2015):

Running ffmpeg -i 'NOS Journaal-POW_00942207.nl.srt' something.srt seems to work fine, but I haven't tried with libav.

I'm working on properly handling subtitles formats (for #4019).

@jaimeMF commented on GitHub (Feb 15, 2015): Running `ffmpeg -i 'NOS Journaal-POW_00942207.nl.srt' something.srt` seems to work fine, but I haven't tried with libav. I'm working on properly handling subtitles formats (for #4019).
Author
Owner

@Reino17 commented on GitHub (Feb 27, 2015):

My findings:

ffmpeg -hide_banner -i http://e.omroep.nl/tt888/POW_00942207 "NOS Journaal-POW_00942207.nl.s
rt"
...
  Stream #0:0 -> #0:0 (webvtt (native) -> subrip (srt))
Press [q] to stop, [?] for help
[webvtt @ 02f9f500] Invalid UTF-8 in decoded subtitles text; maybe missing -sub_charenc opti
on

So I don't know about other websites, but in case of npo.nl that commandline needs to be:

ffmpeg -hide_banner -sub_charenc CP1252 -i http://e.omroep.nl/tt888/POW_00942207 "NOS Journa
al-POW_00942207.nl.srt"

Then muxing...
The only method that works without any errors is muxing a progressive videostream with subs to matroska:

ffmpeg -hide_banner -i http://content50c1b.omroep.nl/....../POW_00942207/std.20150214.m4v 
-sub_charenc CP1252 -i http://e.omroep.nl/tt888/POW_00942207 -map 0:0 -map 0:1 -c copy
-map 1 -c:s srt -metadata:s:s:0 language=nl "NOS Journaal-POW_00942207.nl.m4v.mkv"

Adaptive videostream with subs to matroska:

ffmpeg -hide_banner -i http://....npostreaming.nl/.../POW_00942207-audio_eng%3D128000-video%
3D1001000.m3u8 -sub_charenc CP1252 -i http://e.omroep.nl/tt888/POW_00942207 -c copy -bsf:a
aac_adtstoasc -c:s srt -metadata:s:s:0 language=nl "NOS Journaal-POW_00942207.nl.m3u8.mkv"
...
[matroska @ 03bfefe0] Error parsing AAC extradata, unable to determine samplerate.

Here -bsf:a aac_adtstoasc doesn't seem to do anything. Unless I've overlooked something, there's no other option than to mux to and create a temporary mp4-file.

Adaptive videostream with subs to mp4:

ffmpeg -hide_banner -fflags genpts -i http://....npostreaming.nl/.../POW_00942207-audio_eng%
3D128000-video%3D1001000.m3u8 -sub_charenc CP1252 -i http://e.omroep.nl/tt888/POW_00942207
-c copy -bsf:a aac_adtstoasc -c:s mov_text -metadata:s:s:0 language=nl "NOS Journaal-POW_009
42207.nl.m3u8.mp4"
...
[mp4 @ 030420a0] Application provided duration: -1 / timestamp: 5019 is out of range for mov
/mp4 format
[mp4 @ 030420a0] pts has no value

Here -fflags genpts doesn't seem to do anything. But despite these warnings (they're not errors), a working mp4 is created.

Progressive videostream with subs to mp4:
Same "pts has no value" as above.

I hope this helps you in your subtitle-quest.

@Reino17 commented on GitHub (Feb 27, 2015): **My findings:** ``` ffmpeg -hide_banner -i http://e.omroep.nl/tt888/POW_00942207 "NOS Journaal-POW_00942207.nl.s rt" ... Stream #0:0 -> #0:0 (webvtt (native) -> subrip (srt)) Press [q] to stop, [?] for help [webvtt @ 02f9f500] Invalid UTF-8 in decoded subtitles text; maybe missing -sub_charenc opti on ``` So I don't know about other websites, but in case of npo.nl that commandline needs to be: ``` ffmpeg -hide_banner -sub_charenc CP1252 -i http://e.omroep.nl/tt888/POW_00942207 "NOS Journa al-POW_00942207.nl.srt" ``` Then muxing... The only method that works without any errors is muxing a _progressive videostream_ with subs to _matroska_: ``` ffmpeg -hide_banner -i http://content50c1b.omroep.nl/....../POW_00942207/std.20150214.m4v -sub_charenc CP1252 -i http://e.omroep.nl/tt888/POW_00942207 -map 0:0 -map 0:1 -c copy -map 1 -c:s srt -metadata:s:s:0 language=nl "NOS Journaal-POW_00942207.nl.m4v.mkv" ``` _Adaptive videostream_ with subs to _matroska_: ``` ffmpeg -hide_banner -i http://....npostreaming.nl/.../POW_00942207-audio_eng%3D128000-video% 3D1001000.m3u8 -sub_charenc CP1252 -i http://e.omroep.nl/tt888/POW_00942207 -c copy -bsf:a aac_adtstoasc -c:s srt -metadata:s:s:0 language=nl "NOS Journaal-POW_00942207.nl.m3u8.mkv" ... [matroska @ 03bfefe0] Error parsing AAC extradata, unable to determine samplerate. ``` Here `-bsf:a aac_adtstoasc` doesn't seem to do anything. Unless I've overlooked something, there's no other option than to mux to and create a temporary mp4-file. _Adaptive videostream_ with subs to _mp4_: ``` ffmpeg -hide_banner -fflags genpts -i http://....npostreaming.nl/.../POW_00942207-audio_eng% 3D128000-video%3D1001000.m3u8 -sub_charenc CP1252 -i http://e.omroep.nl/tt888/POW_00942207 -c copy -bsf:a aac_adtstoasc -c:s mov_text -metadata:s:s:0 language=nl "NOS Journaal-POW_009 42207.nl.m3u8.mp4" ... [mp4 @ 030420a0] Application provided duration: -1 / timestamp: 5019 is out of range for mov /mp4 format [mp4 @ 030420a0] pts has no value ``` Here `-fflags genpts` doesn't seem to do anything. But despite these warnings (they're not errors), a working mp4 is created. _Progressive videostream_ with subs to _mp4_: Same "pts has no value" as above. I hope this helps you in your subtitle-quest.
Author
Owner

@jaimeMF commented on GitHub (Feb 28, 2015):

In the next version, you'll be able to run youtube-dl 'http://www.npo.nl/nos-journaal/14-02-2015/POW_00942207' --write-sub --convert-subtitles srt and the vtt subtitles will be converted to srt.
If you want to embed them in a mkv file, please open a new issue.

Thanks for the report.

@jaimeMF commented on GitHub (Feb 28, 2015): In the next version, you'll be able to run `youtube-dl 'http://www.npo.nl/nos-journaal/14-02-2015/POW_00942207' --write-sub --convert-subtitles srt` and the vtt subtitles will be converted to srt. If you want to embed them in a mkv file, please open a new issue. Thanks for the report.
Author
Owner

@Reino17 commented on GitHub (Feb 28, 2015):

Have you tested it yourself, because it doesn't work. FFMpeg is detected, but doesn't do anything, hence there's no conversion at all. And the subs still have the vtt-extension.

D:\youtube-dl-master>python -m youtube_dl -v --skip-download --write-sub --convert-subtitles
srt http://www.npo.nl/nos-journaal/07-02-2015/POW_00942206
[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['-v', '--skip-download', '--write-sub', '--convert-subtitles',
'srt', 'http://www.npo.nl/nos-journaal/07-02-2015/POW_00942206']
[debug] Encodings: locale cp1252, fs mbcs, out cp437, pref cp1252
[debug] youtube-dl version 2015.02.26.2
[debug] Python version 3.4.2 - Windows-XP-5.1.2600-SP3
[debug] exe versions: ffmpeg N-69779-g2a72b16
[debug] Proxy map: {}
[npo.nl] POW_00942206: Downloading JSON metadata
[npo.nl] POW_00942206: Downloading token
[npo.nl] POW_00942206: Downloading adaptive JSON
[npo.nl] POW_00942206: Downloading adaptive stream JSON
[npo.nl] POW_00942206: Downloading m3u8 information
[npo.nl] POW_00942206: Downloading h264_bb JSON
[npo.nl] POW_00942206: Downloading h264_bb stream JSON
[npo.nl] POW_00942206: Downloading h264_sb JSON
[npo.nl] POW_00942206: Downloading h264_sb stream JSON
[npo.nl] POW_00942206: Downloading h264_std JSON
[npo.nl] POW_00942206: Downloading h264_std stream JSON
[info] Writing video subtitles to: NOS Journaal-POW_00942206.nl.vtt
WEBVTT

1
00:00:03.000 --> 00:00:04.018
888

2
00:00:04.018 --> 00:00:09.006
Op de veiligheidsconferentie in M�unchen klinkt
duidelijke taal over het Oekraiense conflict.
...
@Reino17 commented on GitHub (Feb 28, 2015): Have you tested it yourself, because it doesn't work. FFMpeg is detected, but doesn't do anything, hence there's no conversion at all. And the subs still have the vtt-extension. ``` D:\youtube-dl-master>python -m youtube_dl -v --skip-download --write-sub --convert-subtitles srt http://www.npo.nl/nos-journaal/07-02-2015/POW_00942206 [debug] System config: [] [debug] User config: [] [debug] Command-line args: ['-v', '--skip-download', '--write-sub', '--convert-subtitles', 'srt', 'http://www.npo.nl/nos-journaal/07-02-2015/POW_00942206'] [debug] Encodings: locale cp1252, fs mbcs, out cp437, pref cp1252 [debug] youtube-dl version 2015.02.26.2 [debug] Python version 3.4.2 - Windows-XP-5.1.2600-SP3 [debug] exe versions: ffmpeg N-69779-g2a72b16 [debug] Proxy map: {} [npo.nl] POW_00942206: Downloading JSON metadata [npo.nl] POW_00942206: Downloading token [npo.nl] POW_00942206: Downloading adaptive JSON [npo.nl] POW_00942206: Downloading adaptive stream JSON [npo.nl] POW_00942206: Downloading m3u8 information [npo.nl] POW_00942206: Downloading h264_bb JSON [npo.nl] POW_00942206: Downloading h264_bb stream JSON [npo.nl] POW_00942206: Downloading h264_sb JSON [npo.nl] POW_00942206: Downloading h264_sb stream JSON [npo.nl] POW_00942206: Downloading h264_std JSON [npo.nl] POW_00942206: Downloading h264_std stream JSON [info] Writing video subtitles to: NOS Journaal-POW_00942206.nl.vtt ``` ``` WEBVTT 1 00:00:03.000 --> 00:00:04.018 888 2 00:00:04.018 --> 00:00:09.006 Op de veiligheidsconferentie in M�unchen klinkt duidelijke taal over het Oekraiense conflict. ... ```
Author
Owner

@jaimeMF commented on GitHub (Feb 28, 2015):

For a moment you make me doubt if I have tested it :)

You just have to remove the --skip-download option, it's how the postprocessors system works: they won't be run if the video is not downloaded, because most of them need it. (If you have already downloaded the video it will detect it and won't be redownloaded)

@jaimeMF commented on GitHub (Feb 28, 2015): For a moment you make me doubt if I have tested it :) You just have to remove the `--skip-download` option, it's how the postprocessors system works: they won't be run if the video is not downloaded, because most of them need it. (If you have already downloaded the video it will detect it and won't be redownloaded)
Author
Owner

@Reino17 commented on GitHub (Feb 28, 2015):

D:\youtube-dl-master>python -m youtube_dl -v --write-sub --convert-subtitles srt http://www.
npo.nl/nos-journaal/07-02-2015/POW_00942206
...
[info] Writing video subtitles to: NOS Journaal-POW_00942206.nl.vtt
[debug] Invoking downloader on 'http://content50c1b.omroep.nl/.../POW_00942206/std.20150207.
m4v?odiredirecturl=...'
[download] NOS Journaal-POW_00942206.m4v has already been downloaded
[download] 100% of 177.21MiB
[ffmpeg] Converting subtitles
[debug] ffmpeg command line: ffmpeg -y -i 'NOS Journaal-POW_00942206.nl.vtt' -f srt 'NOS Jou
rnaal-POW_00942206.nl.srt'

I see. But could you perhaps load the subtitle-url directly instead of seperately downloading the webvtt-subtitles?

NOS Journaal-POW_00942206.nl.srt by youtube-dl (and same as vtt):

2
00:00:04,018 --> 00:00:09,008
Op de veiligheidsconferentie in M�unchen klinkt
duidelijke taal over het Oekraiense conflict.

NOS Journaal-POW_00942206.nl.srt through ffmpeg
(ffmpeg -sub_charenc CP1252 -i 'NOS Journaal-POW_00942206.nl.vtt' 'NOS Journaal-POW_00942206.nl.srt'):

2
00:00:04,018 --> 00:00:09,008
Op de veiligheidsconferentie in M�unchen klinkt
duidelijke taal over het Oekraiense conflict.

NOS Journaal-POW_00942206.nl.srt through ffmpeg
(ffmpeg -sub_charenc CP1252 -i http://e.omroep.nl/tt888/POW_00942206 'NOS Journaal-POW_00942206.nl.srt'):

2
00:00:04,018 --> 00:00:09,008
Op de veiligheidsconferentie in MÈunchen klinkt
duidelijke taal over het Oekraiense conflict.

Obviously MÈunchen is still misspelled, but at least it's now the same as shown on npo.nl (by the flashplayer).

@Reino17 commented on GitHub (Feb 28, 2015): ``` D:\youtube-dl-master>python -m youtube_dl -v --write-sub --convert-subtitles srt http://www. npo.nl/nos-journaal/07-02-2015/POW_00942206 ... [info] Writing video subtitles to: NOS Journaal-POW_00942206.nl.vtt [debug] Invoking downloader on 'http://content50c1b.omroep.nl/.../POW_00942206/std.20150207. m4v?odiredirecturl=...' [download] NOS Journaal-POW_00942206.m4v has already been downloaded [download] 100% of 177.21MiB [ffmpeg] Converting subtitles [debug] ffmpeg command line: ffmpeg -y -i 'NOS Journaal-POW_00942206.nl.vtt' -f srt 'NOS Jou rnaal-POW_00942206.nl.srt' ``` I see. But could you perhaps load the subtitle-url directly instead of seperately downloading the webvtt-subtitles? NOS Journaal-POW_00942206.nl.srt by youtube-dl (and same as vtt): ``` 2 00:00:04,018 --> 00:00:09,008 Op de veiligheidsconferentie in M�unchen klinkt duidelijke taal over het Oekraiense conflict. ``` NOS Journaal-POW_00942206.nl.srt through ffmpeg (`ffmpeg -sub_charenc CP1252 -i 'NOS Journaal-POW_00942206.nl.vtt' 'NOS Journaal-POW_00942206.nl.srt'`): ``` 2 00:00:04,018 --> 00:00:09,008 Op de veiligheidsconferentie in M�unchen klinkt duidelijke taal over het Oekraiense conflict. ``` NOS Journaal-POW_00942206.nl.srt through ffmpeg (`ffmpeg -sub_charenc CP1252 -i http://e.omroep.nl/tt888/POW_00942206 'NOS Journaal-POW_00942206.nl.srt'`): ``` 2 00:00:04,018 --> 00:00:09,008 Op de veiligheidsconferentie in MÈunchen klinkt duidelijke taal over het Oekraiense conflict. ``` Obviously MÈunchen is still misspelled, but at least it's now the same as shown on npo.nl (by the flashplayer).
Author
Owner

@jaimeMF commented on GitHub (Feb 28, 2015):

You'll always end up downloading from the url, and it's easier to always download them and only run ffmpeg if --convert-subtitles is given.

About the characters issue, they don't tell on the HTTP headers which encoding they are using:

$ curl 'http://e.omroep.nl/tt888/POW_00942207' -v > /dev/null
* Adding handle: conn: 0x7fd4e0803000
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x7fd4e0803000) send_pipe: 1, recv_pipe: 0
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* About to connect() to e.omroep.nl port 80 (#0)
*   Trying 145.58.30.39...
* Connected to e.omroep.nl (145.58.30.39) port 80 (#0)
> GET /tt888/POW_00942207 HTTP/1.1
> User-Agent: curl/7.30.0
> Host: e.omroep.nl
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Sat, 28 Feb 2015 22:30:20 GMT
* Server Apache is not blacklisted
< Server: Apache
< Cache-Control: private, max-age=300, must-revalidate
< Expires: Sat, 28 Feb 2015 22:35:20 GMT
< Last-Modified: Sat, 28 Feb 2015 22:30:20 GMT
< Access-Control-Allow-Origin: *
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: text/vtt
<
{ [data not shown]
100 28464    0 28464    0     0  30599      0 --:--:-- --:--:-- --:--:-- 30606
* Closing connection 0

So we can't detect it and assume that it's utf-8 and ignore errors. If they are always encoded with CP1252 we could download them inside the extractor and properly decode them.

@jaimeMF commented on GitHub (Feb 28, 2015): You'll always end up downloading from the url, and it's easier to always download them and only run ffmpeg if `--convert-subtitles` is given. About the characters issue, they don't tell on the HTTP headers which encoding they are using: ``` $ curl 'http://e.omroep.nl/tt888/POW_00942207' -v > /dev/null * Adding handle: conn: 0x7fd4e0803000 * Adding handle: send: 0 * Adding handle: recv: 0 * Curl_addHandleToPipeline: length: 1 * - Conn 0 (0x7fd4e0803000) send_pipe: 1, recv_pipe: 0 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* About to connect() to e.omroep.nl port 80 (#0) * Trying 145.58.30.39... * Connected to e.omroep.nl (145.58.30.39) port 80 (#0) > GET /tt888/POW_00942207 HTTP/1.1 > User-Agent: curl/7.30.0 > Host: e.omroep.nl > Accept: */* > < HTTP/1.1 200 OK < Date: Sat, 28 Feb 2015 22:30:20 GMT * Server Apache is not blacklisted < Server: Apache < Cache-Control: private, max-age=300, must-revalidate < Expires: Sat, 28 Feb 2015 22:35:20 GMT < Last-Modified: Sat, 28 Feb 2015 22:30:20 GMT < Access-Control-Allow-Origin: * < Connection: close < Transfer-Encoding: chunked < Content-Type: text/vtt < { [data not shown] 100 28464 0 28464 0 0 30599 0 --:--:-- --:--:-- --:--:-- 30606 * Closing connection 0 ``` So we can't detect it and assume that it's utf-8 and ignore errors. If they are always encoded with CP1252 we could download them inside the extractor and properly decode them.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/youtube-dl-ytdl-org#4022
No description provided.