我们分两种情况介绍网易新闻手机端视频的地址解析:
- 单独视频播放页
- 含有视频的富媒体页
单独视频播放页
以播放页 http://c.m.163.com/news/v/VC3A7L2UU.html 为例说说如何得到视频的真实地址。
其实,视频真实地址就在页面源代码中:
<article class="m-video-player"><div class="video-holder"><div class="video-wrap js-video">
![](http://upload-images.jianshu.io/upload_images/3479982-f220b04c02c30be7.jpg?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)
<video src="http://flv2.bn.netease.com/videolib3/1610/25/wnjIA1492/SD/wnjIA1492-mobile.mp4" type="video/mp4" webkit-playsinline="true" controls preload="none"></video>
...
</article>
video元素的src属性值即为视频的真实地址。
Python代码示例:
import requests
from pyquery import PyQuery as pq
headers = {'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.76 Mobile Safari/537.36',}
r = requests.get('http://c.m.163.com/news/v/VC3A7L2UU.html', headers=headers)
d = pq(r.content)
print d('video').attr('src')
注意,代码示例中在请求手机端页面的时候加上了手机浏览器的User-Agent请求头。
含有视频的富媒体页
以富媒体页面 http://c.m.163.com/news/a/C42K63S1000181N1.html 为例说说如何得到视频的真实地址。
这次没有那么幸运了,页面源码中并没有视频地址。当使用桌面浏览器打开页面并且点击“视频播放器”打算播放的时候,页面跳转了,提示我们下载网易新闻APP观看。那只好在手机端抓包分析了。我在安卓模拟器中用浏览器打开页面,同时使用Wireshark捕捉HTTP数据包,并对抓到的数据包逐一分析,发现了如下接口请求:
http://c.m.163.com/nc/article/C42K63S1000181N1/full.html
该请求地址中的C42K63S1000181N1即为富媒体页面地址的一部分。
请求该接口返回的JSON数据如下:
{
"C42K63S1000181N1": {
"body": "<p> 美国大选这场戏,我已经看不懂了……</p><p> 美国名流大多支持希拉里,这事儿就不提了</p><p> 但支持的程度,已经近乎癫狂</p><p> 比如</p><p> 麦当娜</p><p> 10月21日,麦当娜在演唱会上说:</p><p> <strong>“如果谁给希拉里投票,我就亲自帮他blowjob。”</strong></p><!--IMG#0--><!--IMG#1--><!--IMG#2--><p> 说完之后,粉丝一片咋呼声</p><p> 麦当娜又继续说,“我没开玩笑,我发誓。”</p><p> 并且她强调了两遍“<strong>I m good</strong>”</p><p> ……</p><p> 算了,啥也不说了</p><p> 血槽已空</p><p> 直接看视频吧</p><p><!--VIDEO#0--></p><p> <strong>(相信不需要字幕 你们也能完全领会)</strong></p><p> 这段视频迅速在网上传开,看热闹的群众越来越多</p><p> 很多人@麦当娜——<strong>已投希拉里,请兑奖</strong>。</p><!--IMG#3--><!--IMG#4--><p> 有人干脆晒出了自己的选票作为证据</p><!--IMG#5--><p> 还有人说,“如果你不兑现承诺,宝宝要投特朗普了!”</p><!--IMG#6--><p> 所以,麦当娜这到底是给谁拉票了?</p><!--IMG#7--><p> 麦当娜今年58岁</p><p> 一直是希拉里的坚定支持者</p><p> 多次在社交网络上po出两人合照</p><!--IMG#8--><p> 9月底,麦当娜在Instagram发布了一张只露出下半边脸、胸部以上赤裸的照片,</p><p> 并在旁边写道:“我跟凯蒂·佩里一起赤裸着为希拉里投票,希拉里是最棒的!”</p><!--IMG#9--><p> 这条ins仅在45分钟内就收获12000个赞</p><p> 而在她之前一天,<strong>水果姐</strong>发布了<strong>全裸</strong>的拉票视频</p><p> 头上粘着棒棒糖,身体赤裸,表情搞怪,拼尽全力为希拉里拉票。</p><!--IMG#10--><p> 除此之外,麦当娜还发了一张2015年在全英音乐节上的表演照片,</p><p> 照片中希拉里参与了她的演出,旁边她配文说,为希拉里的才华、对女性权利的支持而投票。</p><!--IMG#11--><p> 真的</p><p> 服了。</p><p> <b>版权声明:本文系“城墙上的守夜人”(公众号:watcher_on_the_wall)授权哒哒发表,如需转载请联系“城墙上的守夜人”获取授权,严禁私自进行二次转载,违者必究</b></p><p><!--link0--></p>",
"users": [],
"ydbaike": [],
"replyCount": 16755,
"link": [
{
"ref": "<!--link0-->",
"title": "网易哒哒-世界没那么正经",
"imgsrc": "http://img4.cache.netease.com/cnews/2016/5/23/201605231541340f7b1.png",
"digest": "iOS及安卓版均已上线,点击下载体验",
"type": "linkCard",
"href": "http://d.news.163.com/pages/download.html?channel=newsDetailBottom"
}
],
"img": [
{
"ref": "<!--IMG#0-->",
"pixel": "474*265",
"alt": "",
"src": "http://dmr.nosdn.127.net/TOUM07FQjUrtgMX8Nh1Smg==/6896093022622558003.jpg"
},
{
"ref": "<!--IMG#1-->",
"pixel": "468*260",
"alt": "",
"src": "http://dmr.nosdn.127.net/Y90O79U0CCmoacrYu_yIWg==/6896093022622553915.jpg"
},
{
"ref": "<!--IMG#2-->",
"pixel": "471*270",
"alt": "",
"src": "http://dmr.nosdn.127.net/tODi540REw5fd3XVYxlvWQ==/6896093022622552995.jpg"
},
{
"ref": "<!--IMG#3-->",
"pixel": "541*85",
"alt": "",
"src": "http://dmr.nosdn.127.net/5SFpF4myP0Xnrx3NRIddxA==/6896093022622557139.jpg"
},
{
"ref": "<!--IMG#4-->",
"pixel": "581*101",
"alt": "",
"src": "http://dmr.nosdn.127.net/wPkpPCckLjPmEhwxN0n4ow==/6896093022622557594.jpg"
},
{
"ref": "<!--IMG#5-->",
"pixel": "575*316",
"alt": "",
"src": "http://dmr.nosdn.127.net/QRo-3fN4ek7GdVJ_9M-WaQ==/6896093022622556221.jpg"
},
{
"ref": "<!--IMG#6-->",
"pixel": "518*166",
"alt": "",
"src": "http://dmr.nosdn.127.net/wcoS0VWDBNYHRLkG68W-Jg==/6896093022622557596.jpg"
},
{
"ref": "<!--IMG#7-->",
"pixel": "647*676",
"alt": "",
"src": "http://dmr.nosdn.127.net/gVHdkhLcBImVWC4aA-uNDg==/6896093022622556639.jpg"
},
{
"ref": "<!--IMG#8-->",
"pixel": "583*568",
"alt": "",
"src": "http://dmr.nosdn.127.net/7cVzjO-rYx2PuZGD1En0AA==/6896093022622554365.jpg"
},
{
"ref": "<!--IMG#9-->",
"pixel": "600*604",
"alt": "",
"src": "http://dmr.nosdn.127.net/aqSmN6RzjbBGdweQcUlC5w==/6896093022622557600.jpg"
},
{
"ref": "<!--IMG#10-->",
"pixel": "600*338",
"alt": "",
"src": "http://dmr.nosdn.127.net/JobRmBEqf1E5z_mDUayr3Q==/6896093022622556644.jpg"
},
{
"ref": "<!--IMG#11-->",
"pixel": "600*730",
"alt": "",
"src": "http://dmr.nosdn.127.net/RTphdMQgPazJ8Y99Rgpwcg==/6896093022622557117.jpg"
}
],
"votes": [],
"shareLink": "http://c.m.163.com/news/a/C42K63S1000181N1.html?spss=newsapp&spsw=1",
"digest": "",
"topiclist_news": [
{
"hasCover": false,
"subnum": "超过1000万",
"alias": "dada",
"tname": "哒哒趣闻",
"ename": "dada",
"tid": "T1444289532601",
"cid": "C1348647991705"
}
],
"dkeys": "麦当娜,希拉里",
"ec": "宋欣蔚_NX2107",
"topiclist": [],
"docid": "C42K63S1000181N1",
"sourceinfo": {
"alias": "我是个好奇好玩好色的家伙",
"ename": "T1472457096458",
"tname": "城墙上的守夜人",
"tid": "T1472457096458"
},
"picnews": true,
"title": "麦当娜说投希拉里,她就给你XX,而且还会吞下去",
"tid": "",
"video": [
{
"commentid": "C34Q847R008535RB",
"topicid": "1000",
"broadcast": "in",
"videosource": "新媒体",
"commentboard": "video_bbs",
"appurl": "",
"mp4Hd_url": null,
"url_m3u8": "http://flv2.bn.netease.com/videolib3/1610/23/KdLCA9629/SD/KdLCA9629-mobile.mp4",
"size": "",
"ref": "<!--VIDEO#0-->",
"cover": "http://vimg3.ws.126.net/image/snapshot/2016/10/7/S/VC34Q847S.jpg",
"url_mp4": "http://flv2.bn.netease.com/videolib3/1610/23/KdLCA9629/SD/KdLCA9629-mobile.mp4",
"alt": "麦当娜:谁投希拉里,我就给谁XX",
"mp4_url": "http://flv2.bn.netease.com/videolib3/1610/23/KdLCA9629/SD/KdLCA9629-mobile.mp4",
"m3u8Hd_url": null,
"m3u8_url": "http://flv2.bn.netease.com/videolib3/1610/23/KdLCA9629/SD/movie_index.m3u8",
"vid": "VC34Q847R"
}
],
"template": "normal",
"threadVote": 9,
"threadAgainst": 6,
"boboList": [],
"replyBoard": "news2_bbs",
"source": "城墙上的守夜人",
"hasNext": false,
"voicecomment": "off",
"relative_sys": [
{
"id": "C4MRFFS6051784VS",
"title": "【时尚早八卦】范冰冰李晨H&M17年春节广告照出炉甜!水果姐cos希拉里,相似度吓人!",
"source": "橘子娱乐",
"imgsrc": "http://cms-bucket.nosdn.127.net/c50adfa6c9b14b3b8c8006ed5e9a3fa820161031115033.png",
"docID": "C4MRFFS6051784VS",
"from": "BJ",
"type": "doc",
"ptime": "2016-10-31 10:30:10",
"href": ""
},
{
"id": "C480LUBS000181N5",
"title": "裸奔、BJ...明星们为支持美国大选都疯了吗?",
"source": "哒哒良品",
"imgsrc": "http://cms-bucket.nosdn.127.net/adc36b5eafea40b5916e76ce488f78cb20161025164014.jpeg",
"docID": "C480LUBS000181N5",
"from": "BJ",
"type": "doc",
"ptime": "2016-10-25 15:36:22",
"href": ""
},
{
"id": "C4L7T9RB0519A583",
"title": "美国大选能否再现退欧逆转?高盛:不要想太多",
"source": "微盘游戏",
"imgsrc": "http://dingyue.nosdn.127.net/h9eeQ74Vws6qc9vrHbsInUAn7GqWy8NoaghLPCZ5UA7Sl1477824765529.jpg",
"docID": "C4L7T9RB0519A583",
"from": "BJ",
"type": "doc",
"ptime": "2016-10-30 21:21:13",
"href": ""
}
],
"ptime": "2016-10-23 13:21:48"
}
}
在上面的JSON数据中,video列表里面包含的就是视频的真实地址了。其中,mp4Hd_url是高清地址,mp4_url是标清地址,这两个地址中可能有一个不存在。
至此,Python代码示例如下:
import re
import json
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.76 Mobile Safari/537.36',}
m = re.search(r'/(\w+).html', 'http://c.m.163.com/news/a/C42K63S1000181N1.html')
doc_id = m.groups()[0]
api_url = 'http://c.m.163.com/nc/article/%s/full.html' % doc_id
r = requests.get(api_url, headers=headers)
data = json.loads(r.content)
videos = data[doc_id]['video']
for video in videos:
print video.get('mp4Hd_url')
print video.get('mp4_url')