以芒果TV播放页地址http://www.mgtv.com/v/10/167938/c/2942052.html�为例,说说如何解析得到真实视频地址。
提取视频ID(即vid)
芒果TV的视频ID可以从播放页地址中得到。上述播放页地址中,2942052就是视频ID.
请求接口
以视频ID为参数请求接口 http://v.api.mgtv.com/player/video?retry=1&video_id=2942052 返回的JSON数据如下:
{
"status": 200,
"msg": "",
"data": {
"user": {
"id": "",
"uuid": "",
"isvip": "0",
"purview": "200",
"ip": "1.95.84.161"
},
"info": {
"video_id": "3467275",
"root_id": "10",
"root_name": "新闻",
"collection_id": "298221",
"collection_name": "午间新闻2016",
"clip_type": "2",
"title": "长沙着力解决房产证遗留问题",
"sub_title": "长沙着力解决房产证遗留问题",
"series": "",
"url": "http://www.hunantv.com/v/10/298221/c/3467275.html",
"thumb": "http://2img.mgtv.com/preview/sp_images/2016/xinwen/129747/2942052/20160106130629996.jpg_220x123.jpg",
"desc": "长沙着力解决房产证遗留问题",
"icon": "",
"ispayment": "0",
"isdrm": "0",
"isiplimit": "0",
"duration": "37",
"trialtime": "300",
"scale": "16:9",
"istry": "0",
"price_vip": "2.5",
"price_novip": "5",
"price": "0"
},
"stream": [
{
"url": "/vod.do?fmt=2&pno=1000&fid=AD299930AAB40177B5F10D4AF1471071&now=1478094895&gsid=a599287b6be0403988d92fae7a7f930a&file=%2Fmp4%2F2016%2Fxinwen%2Fwjxw_46493%2FAD299930AAB40177B5F10D4AF1471071_20160106_1_1_388_mp4%2F27647495F355811D49F4A26327EC35DC.m3u8",
"name": "标清"
},
{
"url": "/vod.do?fmt=2&pno=1000&fid=CBA97B6D9BA73EB84283817B1FE38FE8&now=1478094895&gsid=7f3a643f38dc469d81fd8bc942c6aa89&file=%2Fmp4%2F2016%2Fxinwen%2Fwjxw_46493%2FCBA97B6D9BA73EB84283817B1FE38FE8_20160106_1_1_719_mp4%2FB2B29E3A4D135CB9B86B85841D12139C.m3u8",
"name": "高清"
},
{
"url": "/vod.do?fmt=2&pno=1000&fid=593B5886239791678A15C4A872A60B08&now=1478094895&gsid=79b4eb17ce3b441084b4828230fdfdd7&file=%2Fmp4%2F2016%2Fxinwen%2Fwjxw_46493%2F593B5886239791678A15C4A872A60B08_20160106_1_1_1100_mp4%2FE2017FFD796C00BB185FBC30B6D7360B.m3u8",
"name": "超清"
}
],
"stream_domain": [
"http://disp.titan.mgtv.com",
"http://disp1.titan.mgtv.com",
"http://disp2.titan.mgtv.com"
],
"share": {
"weibo": "http://service.weibo.com/share/share.php?appkey=152400019&url=http%3A%2F%2Fwww.hunantv.com%2Fv%2F10%2F298221%2Fc%2F3467275.html%3Ffrom%3Dwb&title=%E9%95%BF%E6%B2%99%E7%9D%80%E5%8A%9B%E8%A7%A3%E5%86%B3%E6%88%BF%E4%BA%A7%E8%AF%81%E9%81%97%E7%95%99%E9%97%AE%E9%A2%98&content=utf-8&pic=http://2img.mgtv.com/preview/sp_images/2016/xinwen/129747/2942052/20160106130629996.jpg_220x123.jpg",
"weixin": "http://www.hunantv.com/share/weixin/?url=http://www.hunantv.com/v/10/298221/c/3467275.html&title=%E9%95%BF%E6%B2%99%E7%9D%80%E5%8A%9B%E8%A7%A3%E5%86%B3%E6%88%BF%E4%BA%A7%E8%AF%81%E9%81%97%E7%95%99%E9%97%AE%E9%A2%98",
"qq": "http://connect.qq.com/widget/shareqq/index.html?title=%E9%95%BF%E6%B2%99%E7%9D%80%E5%8A%9B%E8%A7%A3%E5%86%B3%E6%88%BF%E4%BA%A7%E8%AF%81%E9%81%97%E7%95%99%E9%97%AE%E9%A2%98&desc=%E5%9C%A8%E8%8A%92%E6%9E%9CTV%E4%B8%8A%E7%9C%8B%E8%A7%81%E8%BF%99%E6%9D%A1%E8%A7%86%E9%A2%91%E8%BF%98%E4%B8%8D%E9%94%99%E5%93%A6%EF%BC%9A%E9%95%BF%E6%B2%99%E7%9D%80%E5%8A%9B%E8%A7%A3%E5%86%B3%E6%88%BF%E4%BA%A7%E8%AF%81%E9%81%97%E7%95%99%E9%97%AE%E9%A2%98&site=%E8%8A%92%E6%9E%9CTV&url=http://www.hunantv.com/v/10/298221/c/3467275.html&tpa=dW5pb25faWQ9MTAzMjUyXzEwMDAwMV8wMV8wMQ",
"qzone": "http://sns.qzone.qq.com/cgi-bin/qzshare/cgi_qzshare_onekey?url=http://www.hunantv.com/v/10/298221/c/3467275.html&title=%E9%95%BF%E6%B2%99%E7%9D%80%E5%8A%9B%E8%A7%A3%E5%86%B3%E6%88%BF%E4%BA%A7%E8%AF%81%E9%81%97%E7%95%99%E9%97%AE%E9%A2%98"
},
"skipad": [
"http://i1.hunantv.com/ui/2015/images/skipad.jpg",
"http://order.mgtv.com/pay/pc/index.html"
],
"tips": {
"title": "会员跳广告(买12送3)",
"url": "http://order.mgtv.com/pay/pc/index.html",
"img": ""
},
"frame": {
"second": [],
"images": []
},
"points": {
"start": [],
"end": [],
"content": []
},
"next": {
"title": "长沙2016春运“两证”核发",
"img": "http://0img.mgtv.com/preview/sp_images/2016/xinwen/129747/2942051/20160106125233250.jpg_220x123.jpg",
"url": "/v/10/298221/c/3467257.html",
"duration": "23"
}
}
}
我们可以看到,stream列表里面共有3个清晰度,stream_domain列表里面共有3个域名。假如选择超清清晰度对应的url,然后在url前面加上3个域名中的任一个,构成一个地址 http://disp.titan.mgtv.com/vod.do?fmt=2&pno=1000&fid=593B5886239791678A15C4A872A60B08&now=1478094895&gsid=79b4eb17ce3b441084b4828230fdfdd7&file=%2Fmp4%2F2016%2Fxinwen%2Fwjxw_46493%2F593B5886239791678A15C4A872A60B08_20160106_1_1_1100_mp4%2FE2017FFD796C00BB185FBC30B6D7360B.m3u8
请求这个地址得到的也是一个JSON格式数据:
{
"ver": "2015-02-25 v1.0.1",
"isothercdn": "1",
"info": "http://pcvideoaliyun.titan.mgtv.com/mp4/2016/xinwen/wjxw_46493/593B5886239791678A15C4A872A60B08_20160106_1_1_1100_mp4/E2017FFD796C00BB185FBC30B6D7360B.m3u8?uuid=66e980da9e284bbe8222cae71fcc650a&t=581a6306&pno=1000&sign=ccedb70160e9136870fbd72864f11bb2&win=3600&srgid=25021&urgid=81&srgids=25021&nid=25021&payload=usertoken%3dhit%3d0%5eruip%3d23024801&rdur=21600&arange=0&limitrate=0&fid=593B5886239791678A15C4A872A60B08&ver=0x03",
"status": "ok",
"loc": "81-1.95.84.161",
"t": "1478095494",
"idc": "aliyun_cdn"
}
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:11
#EXT-X-MEDIA-SEQUENCE:0
#EXT-MGTV-VIDEO-WIDTH:1280
#EXT-MGTV-VIDEO-HEIGHT:720
#EXT-MGTV-File-SIZE:699360
#EXTINF:5.080000,
5976B00263913818AE86F0122D7B766F_0_5080_1101_v02_mp4.ts?uuid=66e980da9e284bbe8222cae71fcc650a&t=581a6306&pno=1000&sign=ccedb70160e9136870fbd72864f11bb2&win=3600&srgid=25021&urgid=81&srgids=25021&nid=25021&payload=usertoken%3dhit%3d0%5eruip%3d23024801&rdur=21600&arange=0&limitrate=0&fid=593B5886239791678A15C4A872A60B08&ver=0x03
#EXT-MGTV-File-SIZE:512488
#EXTINF:5.000000,
01FC81B6CE4051E6CA9D990A72FF9B61_5080_10080_819_v02_mp4.ts?uuid=66e980da9e284bbe8222cae71fcc650a&t=581a6306&pno=1000&sign=ccedb70160e9136870fbd72864f11bb2&win=3600&srgid=25021&urgid=81&srgids=25021&nid=25021&payload=usertoken%3dhit%3d0%5eruip%3d23024801&rdur=21600&arange=0&limitrate=0&fid=593B5886239791678A15C4A872A60B08&ver=0x03
#EXT-MGTV-File-SIZE:1392140
#EXTINF:10.000000,
E1A9A55392CA363DD79F9F34C5AB1D45_10080_20080_1113_v02_mp4.ts?uuid=66e980da9e284bbe8222cae71fcc650a&t=581a6306&pno=1000&sign=ccedb70160e9136870fbd72864f11bb2&win=3600&srgid=25021&urgid=81&srgids=25021&nid=25021&payload=usertoken%3dhit%3d0%5eruip%3d23024801&rdur=21600&arange=0&limitrate=0&fid=593B5886239791678A15C4A872A60B08&ver=0x03
#EXT-MGTV-File-SIZE:2260136
#EXTINF:10.000000,
645A04A7B53530E5FA5CB5FCCF772A35_20080_30080_1808_v02_mp4.ts?uuid=66e980da9e284bbe8222cae71fcc650a&t=581a6306&pno=1000&sign=ccedb70160e9136870fbd72864f11bb2&win=3600&srgid=25021&urgid=81&srgids=25021&nid=25021&payload=usertoken%3dhit%3d0%5eruip%3d23024801&rdur=21600&arange=0&limitrate=0&fid=593B5886239791678A15C4A872A60B08&ver=0x03
#EXT-MGTV-File-SIZE:926652
#EXTINF:7.080000,
B7D675030CAB254C04330380F8A8A2AF_30080_37160_1047_v02_mp4.ts?uuid=66e980da9e284bbe8222cae71fcc650a&t=581a6306&pno=1000&sign=ccedb70160e9136870fbd72864f11bb2&win=3600&srgid=25021&urgid=81&srgids=25021&nid=25021&payload=usertoken%3dhit%3d0%5eruip%3d23024801&rdur=21600&arange=0&limitrate=0&fid=593B5886239791678A15C4A872A60B08&ver=0x03
#EXT-X-ENDLIST
那些不以#号开头的行就是视频分段地址的一部分。只不过,完整的视频分段地址需要构造出来。如何构造呢?我们用浏览器开发者工具观察到最终的视频地址是这样的:
http://pcvideoaliyun.titan.mgtv.com/mp4/2016/xinwen/wjxw_46493/CBA97B6D9BA73EB84283817B1FE38FE8_20160106_1_1_719_mp4/B4FBE7EEE4C4E58313038787AFA1CBF7_30080_37160_773_v02_mp4.ts?uuid=b6e729c483cb4ebfa2ca6cccc08bacff&t=581a661e&pno=1000&sign=b454c0a00e9240c96c877662329699cb&win=3600&srgid=25021&urgid=81&srgids=25021&nid=25021&payload=usertoken%3dhit%3d0%5eruip%3d23024801&rdur=21600&arange=0&limitrate=0&fid=CBA97B6D9BA73EB84283817B1FE38FE8&ver=0x03&r=309659237103548&tt=3167&yfmode=3
而m3u8地址是这样的:
http://pcvideoaliyun.titan.mgtv.com/mp4/2016/xinwen/wjxw_46493/593B5886239791678A15C4A872A60B08_20160106_1_1_1100_mp4/E2017FFD796C00BB185FBC30B6D7360B.m3u8?uuid=66e980da9e284bbe8222cae71fcc650a&t=581a6306&pno=1000&sign=ccedb70160e9136870fbd72864f11bb2&win=3600&srgid=25021&urgid=81&srgids=25021&nid=25021&payload=usertoken=hit=0%5Eruip=23024801&rdur=21600&arange=0&limitrate=0&fid=593B5886239791678A15C4A872A60B08&ver=0x03
这两个地址的前面一部分是一样的,有木有?这下该知道如何构造完整的视频分段地址了吧。
Python代码示例
import requests
import json
import re
import os
import urlparse
import random
vid = re.search(r'/(\d+).html', 'http://www.mgtv.com/v/10/167938/c/2942052.html').group(1)
r = requests.get('http://v.api.mgtv.com/player/video?retry=1&video_id=%s' % vid)
data = json.loads(r.content)
streams = data['data']['stream']
stream_domains = data['data']['stream_domain']
for stream in streams:
print stream['name']
stream_domain = random.choice(stream_domains)
url = urlparse.urljoin(stream_domain, stream['url'])
r = requests.get(url)
data = json.loads(r.content)
m3u8_url = data['info']
parse_result = urlparse.urlparse(m3u8_url)
r = requests.get(m3u8_url)
segments = re.findall(r'^[^#][^\r][^\n]*', r.content, re.MULTILINE)
for s in segments:
print '%s://%s%s/%s' % (parse_result.scheme, parse_result.netloc, os.path.dirname(parse_result.path), s)