some videos on spiegel.tv can't be downloaded #3447

Closed
opened 2026-02-20 23:23:30 -05:00 by deekerman · 4 comments
Owner

Originally created by @ritterherold on GitHub (Nov 15, 2014).

Looks like many videos on spiegel.de are using a different scheme now. The functionality implemented in #3011 (from @georgjaehnig) is still needed for some things (e.g. "Film zum Wochenende", currently http://www.spiegel.tv/#/filme/alleskino-die-wahrheit-ueber-maenner/) but most of the smaller stuff directly from Spiegel uses something else.

I wrote a small extractor which works for me. I noticed that the actual JS/Flash also requests an XML file (e.g. video http://www.spiegel.de/video/philae-geht-der-strom-aus-video-1535939.html, xml url http://video2.spiegel.de/flash/19/95/1535991.xml) which contains some format info but that seems to be unnecessary.

I wrote a small extractor which works for me but due to time constraints I'm unable to turn that into a proper patch for youtube-dl. Feel free to use the code below as you see fit (you can just treat the code as public domain):

import urllib
import re
import sys
import subprocess

if len(sys.argv) != 2:
    print("%s <url>" % sys.argv[0])
    sys.exit(1)

html_url = sys.argv[1]

match = re.search('/video/(.+)\-video\-(\d+)\.html', html_url)
slug = match.group(1)
video_id = match.group(2)

html = urllib.urlopen(html_url).read()

server_regex = re.compile('var\s+server\s*=\s*"([^"]+)"\s*;')
server_url = server_regex.search(html).group(1)

filename_regex = re.compile("spStartVideo5\(%(video_id)s,\s+'(%(video_id)s_[^']+)',\s+" % dict(video_id=video_id))
filename = filename_regex.search(html).group(1)
video_url = server_url + filename
target_file = '%s.mp4' % slug
subprocess.call(['curl', '-o', target_file, video_url])
Originally created by @ritterherold on GitHub (Nov 15, 2014). Looks like many videos on spiegel.de are using a different scheme now. The functionality implemented in #3011 (from @georgjaehnig) is still needed for some things (e.g. "Film zum Wochenende", currently http://www.spiegel.tv/#/filme/alleskino-die-wahrheit-ueber-maenner/) but most of the smaller stuff directly from Spiegel uses something else. I wrote a small extractor which works for me. I noticed that the actual JS/Flash also requests an XML file (e.g. video http://www.spiegel.de/video/philae-geht-der-strom-aus-video-1535939.html, xml url http://video2.spiegel.de/flash/19/95/1535991.xml) which contains some format info but that seems to be unnecessary. I wrote a small extractor which works for me but due to time constraints I'm unable to turn that into a proper patch for youtube-dl. Feel free to use the code below as you see fit (you can just treat the code as public domain): ``` import urllib import re import sys import subprocess if len(sys.argv) != 2: print("%s <url>" % sys.argv[0]) sys.exit(1) html_url = sys.argv[1] match = re.search('/video/(.+)\-video\-(\d+)\.html', html_url) slug = match.group(1) video_id = match.group(2) html = urllib.urlopen(html_url).read() server_regex = re.compile('var\s+server\s*=\s*"([^"]+)"\s*;') server_url = server_regex.search(html).group(1) filename_regex = re.compile("spStartVideo5\(%(video_id)s,\s+'(%(video_id)s_[^']+)',\s+" % dict(video_id=video_id)) filename = filename_regex.search(html).group(1) video_url = server_url + filename target_file = '%s.mp4' % slug subprocess.call(['curl', '-o', target_file, video_url]) ```
Author
Owner

@phihag commented on GitHub (Nov 15, 2014):

I'm a little confused; while you mentioned spiegel.tv, your URL http://www.spiegel.de/video/philae-geht-der-strom-aus-video-1535939.html is unrelated to spiegel.tv, and handled by our spiegel.de extractor just fine:

$ youtube-dl http://www.spiegel.de/video/philae-geht-der-strom-aus-video-1535939.html
[Spiegel] 1535939: Downloading webpage
[Spiegel] 1535939: Downloading XML
[download] Destination: Mini-Labor mit Energiemangel - 'Philae' geht der Strom aus-1535939.mp4
[download] 100% of 11.95MiB in 00:00
@phihag commented on GitHub (Nov 15, 2014): I'm a little confused; while you mentioned spiegel.tv, your URL http://www.spiegel.de/video/philae-geht-der-strom-aus-video-1535939.html is unrelated to spiegel.tv, and handled by our spiegel.de extractor just fine: ``` $ youtube-dl http://www.spiegel.de/video/philae-geht-der-strom-aus-video-1535939.html [Spiegel] 1535939: Downloading webpage [Spiegel] 1535939: Downloading XML [download] Destination: Mini-Labor mit Energiemangel - 'Philae' geht der Strom aus-1535939.mp4 [download] 100% of 11.95MiB in 00:00 ```
Author
Owner

@phihag commented on GitHub (Nov 15, 2014):

I've also added a match for http://www.spiegel.tv/#/filme/alleskino-die-wahrheit-ueber-maenner/ . I'm closing this issue now, but if there is indeed any URL that does not yet work with youtube-dl, please state it.

@phihag commented on GitHub (Nov 15, 2014): I've also added a match for http://www.spiegel.tv/#/filme/alleskino-die-wahrheit-ueber-maenner/ . I'm closing this issue now, but if there is indeed any URL that does not yet work with youtube-dl, please state it.
Author
Owner

@ritterherold commented on GitHub (Nov 15, 2014):

you're right, sorry for the bad report. I still had an alias for youtube-dl configured which requested some specific formats and the resulting error message did make me think that youtube-dl does not support the mentioned urls. All is good now and thank you for adding the hash url.

@ritterherold commented on GitHub (Nov 15, 2014): you're right, sorry for the bad report. I still had an alias for youtube-dl configured which requested some specific formats and the resulting error message did make me think that youtube-dl does not support the mentioned urls. All is good now and thank you for adding the hash url.
Author
Owner

@phihag commented on GitHub (Nov 15, 2014):

That's why we recommend adding fallback formats. So instead of -f 42, pass in -f 42/best to get the best available video format if 42 is unavailable.

@phihag commented on GitHub (Nov 15, 2014): That's why we recommend adding fallback formats. So instead of `-f 42`, pass in `-f 42/best` to get the best available video format if 42 is unavailable.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/youtube-dl#3447
No description provided.