开发者工具快捷键F12
打开选中的network中的文件,发现这个发送请求的url就是这个视频的网址
Response中的video和audio对应视频和音频,这个视频还有对应的高和宽
获取这几个网址,就能把内容读取出来,然后保存到本地即可。
此外,也可以得出,b站视频和音频是分开的,爬取后,需要进行合并处理。
headers中的Referer和User-agent在代码中会用到
爬虫代码:
import requests
# 视频网址
url = 'https://www.bilibili.com/video/BV1H94y1k7JU/'
headers = {
# Referer 防盗链,告诉服务器你请求链接是从哪里跳转过来的
"Referer": 'https://space.bilibili.com/5970160?spm_id_from=333.337.0.0',
# User-Agent 用户代理, 表示浏览器/设备基本身份信息
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36"
}
# 通过F12查看视频的地址
video_url = 'https://xy115x56x242x13xy.mcdn.bilivideo.cn:8082/v1/resource/1232379231_x1-1-100113.m4s?agrr=0&build=0&buvid=680A9D46-F8B8-154A-F434-C6DEE500981093653infoc&bvc=vod&bw=111134&deadline=1725512168&e=ig8euxZM2rNcNbdlhoNvNC8BqJIzNbfqXBvEqxTEto8BTrNvN0GvT90W5JZMkX_YN0MvXg8gNEV4NC8xNEV4N03eN0B5tZlqNxTEto8BTrNvNeZVuJ10Kj_g2UB02J0mN0B5tZlqNCNEto8BTrNvNC7MTX502C8f2jmMQJ6mqF2fka1mqx6gqj0eN0B599M%3D&f=u_0_0&gen=playurlv2&logo=A0020000&mcdnid=50009080&mid=514369589&nbs=1&nettype=0&og=cos&oi=1033450500&orderid=0%2C3&os=mcdn&platform=pc&sign=34885b&traceid=trfMCLayAzguxh_0_e_N&uipk=5&uparams=e%2Cuipk%2Cnbs%2Cdeadline%2Cgen%2Cos%2Coi%2Ctrid%2Cmid%2Cplatform%2Cog&upsig=5d40c39e0b4d802b2a49d32dd5ca67f2'
video_response = requests.get(video_url, headers=headers)
with open('shipin.mp4', mode='wb') as v:
v.write(video_response.content)
# 通过F12查看音频的地址
audio_url = 'https://xy122x228x71x25xy.mcdn.bilivideo.cn:8082/v1/resource/1232379231-1-30280.m4s?agrr=0&build=0&buvid=680A9D46-F8B8-154A-F434-C6DEE500981093653infoc&bvc=vod&bw=20469&deadline=1725512168&e=ig8euxZM2rNcNbdlhoNvNC8BqJIzNbfqXBvEqxTEto8BTrNvN0GvT90W5JZMkX_YN0MvXg8gNEV4NC8xNEV4N03eN0B5tZlqNxTEto8BTrNvNeZVuJ10Kj_g2UB02J0mN0B5tZlqNCNEto8BTrNvNC7MTX502C8f2jmMQJ6mqF2fka1mqx6gqj0eN0B599M%3D&f=u_0_0&gen=playurlv2&logo=A0020000&mcdnid=50009080&mid=514369589&nbs=1&nettype=0&og=hw&oi=1033450500&orderid=0%2C3&os=mcdn&platform=pc&sign=f24dda&traceid=trbqZtftzpHYsQ_0_e_N&uipk=5&uparams=e%2Cuipk%2Cnbs%2Cdeadline%2Cgen%2Cos%2Coi%2Ctrid%2Cmid%2Cplatform%2Cog&upsig=1eb777301bb6256e0ceaf6fe84d2f946'
audio_response = requests.get(audio_url, headers=headers)
with open('yingpin.mp3', mode='wb') as v:
v.write(audio_response.content)
运行结果:
上面代码的弊端:需要复制视频、音频对应的url。
改进:
添加请求头
视频清晰度和登录有关,登录就涉及到Cookie,因此需要复制Cookie。
注意:cookie中有很多'',在代码中复制的时候需要双引号""
完整代码:
# 导入数据请求模块
import requests
# 导入正则表达式模块
import re
# 导入json模块
import json
url = 'https://www.bilibili.com/video/BV1H94y1k7JU/'
cookie = "LIVE_BUVID=AUTO7016216868658032; buvid_fp_plain=undefined; DedeUserID=514369589; DedeUserID__ckMd5=5a23373a1ce19385; header_theme_version=CLOSE; buvid4=D1587D5D-A18B-CE05-F46B-98DA7314EB8799148-022012718-9TO%2BhWhxQHM3UKOx5YOmuo5NOQmiF8CnsDMbtTDDK%2FR%2FzRA0i%2FZ7yg%3D%3D; FEED_LIVE_VERSION=V8; hit-dyn-v2=1; enable_web_push=DISABLE; rpdid=|(umRJuuk|)J0J'u~|)u~RYY~; SESSDATA=2bec5809%2C1726286133%2C41a3f%2A32CjB90wh400AUbxeOR9kQ5iEUZIfH-y-TRplKtYz6Oti0AuWoYRIKpVXBFaAw9jqg4Z0SVkI4RXVvak52R1Z6Tjk2UHAzRWt3Qk1hdXB0UlRKdEVxZklaRV9RUHNVTk9mWjVEX2I1WDdvOTFxQlFDNWp4bDFSVTVSVmxjS0pQQXo1WV91Qm1YYUZBIIEC; bili_jct=0bb9e96b2ce7cfab12b9502de5cdfadf; CURRENT_BLACKGAP=0; PVID=2; buvid3=680A9D46-F8B8-154A-F434-C6DEE500981093653infoc; b_nut=1715831993; _uuid=2ECFE165-ADE3-C451-11A4-96F7F319692D75771infoc; CURRENT_FNVAL=4048; CURRENT_QUALITY=80; fingerprint=d19f11e890fe919a871197be4012465e; buvid_fp=6141ded7f15429854b4e5f6fe6c53302; home_feed_column=5; browser_resolution=1745-859; bili_ticket=eyJhbGciOiJIUzI1NiIsImtpZCI6InMwMyIsInR5cCI6IkpXVCJ9.eyJleHAiOjE3MjU1MDkzMjYsImlhdCI6MTcyNTI1MDA2NiwicGx0IjotMX0.1jIyfOAeVo4ij4AW8bMEz6tNPhKig34o0ANq7FbqHxs; bili_ticket_expires=1725509266; bp_t_offset_514369589=973458838764126208; b_lsid=3F310F494_191BFF3EAC5; bmg_af_switch=1; bmg_src_def_domain=i1.hdslb.com; sid=4s0jcxau"
headers = {
"Cookie": cookie,
# Referer 防盗链,告诉服务器请求链接是从哪里跳转过来的
"Referer": 'https://space.bilibili.com/5970160?spm_id_from=333.337.0.0',
# User-Agent 用户代理, 表示浏览器/设备基本身份信息
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36"
}
# 发送请求
response = requests.get(url=url, headers=headers)
html = response.text
print(html)
# 解析数据: 提取视频标题
title = re.findall('title="(.*?)"', html)[0]
print(title)
# 提取视频信息
info = re.findall('window.__playinfo__=(.*?)</script>', html)[0]
# info -> json字符串转成json字典
json_data = json.loads(info)
# 提取视频链接
video_url = json_data['data']['dash']['video'][0]['baseUrl']
print(video_url)
# 提取音频链接
audio_url = json_data['data']['dash']['audio'][0]['baseUrl']
print(audio_url)
video_content = requests.get(url=video_url, headers=headers).content
# 获取音频内容
audio_content = requests.get(url=audio_url, headers=headers).content
# 保存数据
with open('video\\' + title + '.mp4', mode='wb') as v:
v.write(video_content)
with open('video\\' + title + '.mp3', mode='wb') as a:
a.write(audio_content)
会打印出网页源代码,通过正则表达式获取视频标题。视频和音频网址,都在window.playinfo=中。同样可以通过正则获取到这块内容。还可以将这块内容转换成字典格式,方便通过键值对机制,获取到指定的url。