Dissecting Youtube URL with Python - 2020
This tutorial is not complete. The code presented is not working yet. So, more work to be done.
As far as downloading Videos from YouTube or Vimeo, the https://rg3.github.io/youtube-dl/ seems to be the best tool.
Please consult my tutorial Downloading YouTube, Vimeo etc Videos using youtube-dl
We'll dissecting YouTube URLs using the following video (Fox Snow Dive - Yellowstone - BBC Two).
In the URL portion of the browser, we have the following Watch URL for the video:
https://www.youtube.com/watch?v=dP15zlyra3c
We're going to parse the URL, and we need the v parameter value from the above Youtube Watch URL:
from urllib import urlopen, unquote from urlparse import parse_qs, urlparse youtube_watchurl="https://www.youtube.com/watch?v=dP15zlyra3c" q = urlparse(youtube_watchurl).query print 'query : ', q
The urlparse(youtube_watchurl).query returns the query part of the above URL. Run it:
query : v=dP15zlyra3c
Let's add new lines of code to convert the query string into dictionary type:
from urllib import urlopen, unquote from urlparse import parse_qs, urlparse youtube_watchurl="https://www.youtube.com/watch?v=dP15zlyra3c" q = urlparse(youtube_watchurl).query print 'query : ', q qs = parse_qs(urlparse(youtube_watchurl).query) print 'parse_qs : ', qs
Run it:
query : v=dP15zlyra3c parse_qs : {'v': ['dP15zlyra3c']}
The urllib.parse.parse_qs returned parameter name ('v') and value('dP15zlyra3c') in dictionary format.
Let's continue from the previous section. This time, we want to get video id:
from urllib import urlopen, unquote from urlparse import parse_qs, urlparse youtube_watchurl="https://www.youtube.com/watch?v=dP15zlyra3c" q = urlparse(youtube_watchurl).query print 'query : ', q qs = parse_qs(urlparse(youtube_watchurl).query) print 'parse_qs : ', qs video_id = parse_qs(urlparse(youtube_watchurl).query)['v'][0] print 'video_id : ', video_id
We're just parsing the returned dictionary. Run the code:
query : v=dP15zlyra3c parse_qs : {'v': ['dP15zlyra3c']} video_id : dP15zlyra3c
Now, we're about to get information about the video:
from urllib import urlopen, unquote from urlparse import parse_qs, urlparse youtube_watchurl="https://www.youtube.com/watch?v=dP15zlyra3c" q = urlparse(youtube_watchurl).query print 'query : ', q qs = parse_qs(urlparse(youtube_watchurl).query) print 'parse_qs : ', qs video_id = parse_qs(urlparse(youtube_watchurl).query)['v'][0] print 'video_id : ', video_id video_info = urlopen('https://www.youtube.com/get_video_info?&video;_id=' + video_id).read().decode('utf-8') print 'video_info : ', video_info
If we run it, we get huge string for the video_info:
query : v=dP15zlyra3c parse_qs : {'v': ['dP15zlyra3c']} video_id : dP15zlyra3c video_info : cl=112110886&fmt;_list=... 2Fhqdefault.jpg
As we have retrieved 'v' parameter value, we want to retrieve the 'token' parameter value:
from urllib import urlopen, unquote from urlparse import parse_qs, urlparse youtube_watchurl="https://www.youtube.com/watch?v=dP15zlyra3c" q = urlparse(youtube_watchurl).query print 'query : ', q qs = parse_qs(urlparse(youtube_watchurl).query) print 'parse_qs : ', qs video_id = parse_qs(urlparse(youtube_watchurl).query)['v'][0] print 'video_id : ', video_id video_info = urlopen('https://www.youtube.com/get_video_info?&video;_id=' + video_id).read().decode('utf-8') print 'video_info : ', video_info token = parse_qs(unquote(video_info))['token'][0] print 'token : ', token
In the code, the urllib.unquote(string) function replaces %xx escapes by their single-character equivalent. For example:
unquote('/%7Econnolly/') yields '/~connolly/'
Run the new code, then we get the token at the end of the output:
query : v=dP15zlyra3c parse_qs : {'v': ['dP15zlyra3c']} video_id : dP15zlyra3c video_info : cl=112110886&fmt;_list=... token : vjVQa1PpcFOak8ftRuzqanvpn_UTUC3efzEwMH2uahE=
Now that we have video id and token string, let's put these value into the direct url to download the video. The format of the Direct URL looks like this:
http://www.youtube.com/get_video?video_id=video_id&t;=token&fmt;=18
So, if we construct the download url from the video_id and the token, we have the following url:
http://www.youtube.com/get_video?video_id=dP15zlyra3c&t;=vjVQa1PpcFOak8ftRuzqanvpn_UTUC3efzEwMH2uahE=&fmt;=18
from urllib import urlopen, unquote from urlparse import parse_qs, urlparse youtube_watchurl="https://www.youtube.com/watch?v=dP15zlyra3c" q = urlparse(youtube_watchurl).query print 'query : ', q qs = parse_qs(urlparse(youtube_watchurl).query) print 'parse_qs : ', qs video_id = parse_qs(urlparse(youtube_watchurl).query)['v'][0] print 'video_id : ', video_id video_info = urlopen('https://www.youtube.com/get_video_info?&video;_id=' + video_id).read().decode('utf-8') print 'video_info : ', video_info token = parse_qs(unquote(video_info))['token'][0] print 'token : ', token open(video_id+'.mp4', 'wb').write(urlopen("http://www.youtube.com/get_video?video_id=%s&t;=%s&fmt;=18"%(video_id, parse_qs(unquote(urlopen('http://www.youtube.com/get_video_info?&video;_id=' + video_id).read().decode('utf-8')))['token'][0])).read())
However, it did not download the file. Epic failure!
I got the following code from StackOverflow - Canât download youtube video, but it did not work, either.
#!/usr/bin/env python3 import sys import urllib.request from urllib.request import urlopen, FancyURLopener from urllib.parse import urlparse, parse_qs, unquote class UndercoverURLopener(FancyURLopener): version = "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.9 Safari/533.2" urllib.request._urlopener = UndercoverURLopener() def youtube_download(video_url): video_id = parse_qs(urlparse(video_url).query)['v'][0] url_data = urlopen('http://www.youtube.com/get_video_info?&video;_id=' + video_id).read() url_info = parse_qs(unquote(url_data.decode('utf-8'))) token_value = url_info['token'][0] download_url = "http://www.youtube.com/get_video?video_id={0}&t;={1}&fmt;=18".format( video_id, token_value) video_title = url_info['title'][0] if 'title' in url_info else '' # Unicode filenames are more trouble than they're worth filename = video_title.encode('ascii', 'ignore').decode('ascii').replace("/", "-") + '.mp4' print("\t Downloading '{}' to '{}'...".format(video_title, filename)) try: download = urlopen(download_url).read() f = open(filename, 'wb') f.write(download) f.close() except Exception as e: print("\t Downlad failed! {}".format(str(e))) print("\t Skipping...") else: print("\t Done.") def main(): print("\n--------------------------") print (" Youtube Video Downloader") print ("--------------------------\n") try: video_urls = sys.argv[1:] except: video_urls = input('Enter (space-separated) video URLs: ') for u in video_urls: youtube_download(u) print("\n Done.") if __name__ == '__main__': main()
Run it on Python3:
$ python y2.py https://www.youtube.com/watch?v=dP15zlyra3c -------------------------- Youtube Video Downloader -------------------------- Downloading 'Fox Snow Dive - Yellowstone - BBC Two' to 'Fox Snow Dive - Yellowstone - BBC Two.mp4'... Done. Done.
Just an empty mp4 file has been generated!
Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization