Dissecting Youtube URL with Python - 2020

bogotobogo.com site search:

Note

This tutorial is not complete. The code presented is not working yet. So, more work to be done.

As far as downloading Videos from YouTube or Vimeo, the https://rg3.github.io/youtube-dl/ seems to be the best tool.

Please consult my tutorial Downloading YouTube, Vimeo etc Videos using youtube-dl

Youtube Watch URL

We'll dissecting YouTube URLs using the following video (Fox Snow Dive - Yellowstone - BBC Two).

In the URL portion of the browser, we have the following Watch URL for the video:

https://www.youtube.com/watch?v=dP15zlyra3c

We're going to parse the URL, and we need the v parameter value from the above Youtube Watch URL:

from urllib import urlopen, unquote
from urlparse import parse_qs, urlparse

youtube_watchurl="https://www.youtube.com/watch?v=dP15zlyra3c"
q = urlparse(youtube_watchurl).query
print 'query : ', q

The urlparse(youtube_watchurl).query returns the query part of the above URL. Run it:

query :  v=dP15zlyra3c

Let's add new lines of code to convert the query string into dictionary type:

from urllib import urlopen, unquote
from urlparse import parse_qs, urlparse

youtube_watchurl="https://www.youtube.com/watch?v=dP15zlyra3c"
q = urlparse(youtube_watchurl).query
print 'query : ', q

qs = parse_qs(urlparse(youtube_watchurl).query)
print 'parse_qs : ', qs

Run it:

query :  v=dP15zlyra3c
parse_qs :  {'v': ['dP15zlyra3c']}

The urllib.parse.parse_qs returned parameter name ('v') and value('dP15zlyra3c') in dictionary format.

Getting video info : video id

Let's continue from the previous section. This time, we want to get video id:

from urllib import urlopen, unquote
from urlparse import parse_qs, urlparse

youtube_watchurl="https://www.youtube.com/watch?v=dP15zlyra3c"
q = urlparse(youtube_watchurl).query
print 'query : ', q

qs = parse_qs(urlparse(youtube_watchurl).query)
print 'parse_qs : ', qs

video_id = parse_qs(urlparse(youtube_watchurl).query)['v'][0]
print 'video_id : ', video_id

We're just parsing the returned dictionary. Run the code:

query :  v=dP15zlyra3c
parse_qs :  {'v': ['dP15zlyra3c']}
video_id :  dP15zlyra3c

Now, we're about to get information about the video:

from urllib import urlopen, unquote
from urlparse import parse_qs, urlparse

youtube_watchurl="https://www.youtube.com/watch?v=dP15zlyra3c"
q = urlparse(youtube_watchurl).query
print 'query : ', q

qs = parse_qs(urlparse(youtube_watchurl).query)
print 'parse_qs : ', qs

video_id = parse_qs(urlparse(youtube_watchurl).query)['v'][0]
print 'video_id : ', video_id

video_info = urlopen('https://www.youtube.com/get_video_info?&video;_id=' + video_id).read().decode('utf-8')
print 'video_info : ', video_info

If we run it, we get huge string for the video_info:

query :  v=dP15zlyra3c
parse_qs :  {'v': ['dP15zlyra3c']}
video_id :  dP15zlyra3c
video_info :  cl=112110886&fmt;_list=...
2Fhqdefault.jpg

Getting token

As we have retrieved 'v' parameter value, we want to retrieve the 'token' parameter value:

from urllib import urlopen, unquote
from urlparse import parse_qs, urlparse

youtube_watchurl="https://www.youtube.com/watch?v=dP15zlyra3c"
q = urlparse(youtube_watchurl).query
print 'query : ', q

qs = parse_qs(urlparse(youtube_watchurl).query)
print 'parse_qs : ', qs

video_id = parse_qs(urlparse(youtube_watchurl).query)['v'][0]
print 'video_id : ', video_id

video_info = urlopen('https://www.youtube.com/get_video_info?&video;_id=' + video_id).read().decode('utf-8')
print 'video_info : ', video_info

token = parse_qs(unquote(video_info))['token'][0]				     
print 'token : ', token

In the code, the urllib.unquote(string) function replaces %xx escapes by their single-character equivalent. For example:

unquote('/%7Econnolly/') yields '/~connolly/'

Run the new code, then we get the token at the end of the output:

query :  v=dP15zlyra3c
parse_qs :  {'v': ['dP15zlyra3c']}
video_id :  dP15zlyra3c
video_info :  cl=112110886&fmt;_list=...
token :  vjVQa1PpcFOak8ftRuzqanvpn_UTUC3efzEwMH2uahE=

Getting download URL

Now that we have video id and token string, let's put these value into the direct url to download the video. The format of the Direct URL looks like this:

http://www.youtube.com/get_video?video_id=video_id&t;=token&fmt;=18

So, if we construct the download url from the video_id and the token, we have the following url:

http://www.youtube.com/get_video?video_id=dP15zlyra3c&t;=vjVQa1PpcFOak8ftRuzqanvpn_UTUC3efzEwMH2uahE=&fmt;=18

Final code A

from urllib import urlopen, unquote
from urlparse import parse_qs, urlparse

youtube_watchurl="https://www.youtube.com/watch?v=dP15zlyra3c"
q = urlparse(youtube_watchurl).query
print 'query : ', q

qs = parse_qs(urlparse(youtube_watchurl).query)
print 'parse_qs : ', qs

video_id = parse_qs(urlparse(youtube_watchurl).query)['v'][0]
print 'video_id : ', video_id

video_info = urlopen('https://www.youtube.com/get_video_info?&video;_id=' + video_id).read().decode('utf-8')
print 'video_info : ', video_info

token = parse_qs(unquote(video_info))['token'][0]		     
print 'token : ', token


open(video_id+'.mp4', 'wb').write(urlopen("http://www.youtube.com/get_video?video_id=%s&t;=%s&fmt;=18"%(video_id, parse_qs(unquote(urlopen('http://www.youtube.com/get_video_info?&video;_id=' + video_id).read().decode('utf-8')))['token'][0])).read())

However, it did not download the file. Epic failure!

Final code B

I got the following code from StackOverflow - Canât download youtube video, but it did not work, either.

#!/usr/bin/env python3

import sys
import urllib.request
from urllib.request import urlopen, FancyURLopener
from urllib.parse import urlparse, parse_qs, unquote

class UndercoverURLopener(FancyURLopener):
    version = "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.9 Safari/533.2"
urllib.request._urlopener = UndercoverURLopener()

def youtube_download(video_url):
    video_id = parse_qs(urlparse(video_url).query)['v'][0]

    url_data = urlopen('http://www.youtube.com/get_video_info?&video;_id=' + video_id).read()
    url_info = parse_qs(unquote(url_data.decode('utf-8')))
    token_value = url_info['token'][0]

    download_url = "http://www.youtube.com/get_video?video_id={0}&t;={1}&fmt;=18".format(
        video_id, token_value)

    video_title = url_info['title'][0] if 'title' in url_info else ''
    # Unicode filenames are more trouble than they're worth
    filename = video_title.encode('ascii', 'ignore').decode('ascii').replace("/", "-") + '.mp4'

    print("\t Downloading '{}' to '{}'...".format(video_title, filename))

    try:
        download = urlopen(download_url).read()
        f = open(filename, 'wb')
        f.write(download)
        f.close()
    except Exception as e:
        print("\t Downlad failed! {}".format(str(e)))
        print("\t Skipping...")
    else:
        print("\t Done.")

def main():
    print("\n--------------------------")
    print (" Youtube Video Downloader")
    print ("--------------------------\n")

    try:
        video_urls = sys.argv[1:]
    except:
        video_urls = input('Enter (space-separated) video URLs: ')

    for u in video_urls:
        youtube_download(u)
    print("\n Done.")

if __name__ == '__main__':
    main()

Run it on Python3:

$ python y2.py https://www.youtube.com/watch?v=dP15zlyra3c

--------------------------
 Youtube Video Downloader
--------------------------

	 Downloading 'Fox Snow Dive - Yellowstone - BBC Two' to 'Fox Snow Dive - Yellowstone - BBC Two.mp4'...
	 Done.

 Done.

Just an empty mp4 file has been generated!

Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization

My YouTube channel

Sponsor Open Source development activities and free contents for everyone.

Thank you.

- K Hong

Sponsor Open Source development activities and free contents for everyone.

Thank you.

- K Hong

Video Streaming

Adaptive Live Streaming with Sorenson Squeeze - HLS, Smooth Stream, and MPEG-DASH

Digital Image Processing - JPEG Compression

FFmpeg on Linux - Audio Video Transcoding Mux Demux

FFmpeg on Windows

FFmpeg Video Capture from Webcam

FFmpeg Http Live Streaming - HLS

FFmpeg Command Options

FFmpeg Smooth Streaming

HTTP vs RTMP

Mobile - Streaming

MPEG-4 Successor H.265/HEVC

MPEG Dynamic Adaptive Streaming over HTTP (MPEG-DASH)

P2P Streaming

Streaming - Mobile

Screen Recording (mp4/ogg) and HTML5 Video

Smooth Streaming on iis 8

Smooth Streaming on Nginx/Apache

Smooth Streaming - Expression Encoder

Streaming (Data Traffic) - Optimization

Video Streaming Sites

VLC - How to Capture Screen to a Video File using VLC

VLC - Live Streaming over http using VLC

VLC - Live Streaming over RTP using VLC

VLC - Transcoding using VLC

VLC - Downloading a YouTube Video using VLC

Downloading YouTube, Vimeo etc Videos using youtube-dl

Using youtube-dl embedded with Python

Dissecting Youtube URL with Python