随便打开一个西瓜视频地址,比如:https://www.ixigua.com/6903716672067076612
查看源代码
可以看到所有信息参数 基本都包括在里面了
url='https://www.ixigua.com/6903716672067076612'
response = requests.get(url, verify=False, headers=headers).text
pattern = re.compile('(?<=window._SSR_HYDRATED_DATA=).*?(?=</script>)')
jsonResult = pattern.findall(response)[0]
这里直接找到_SSR_HYDRATED_DATA参数 正则匹配出来script标签中的内容
结果是一段json数据 不过有一些小问题
就是部分value值是undefined
所以替换一下 直接给他加个双引号
jsonResult = jsonResult.replace(':undefined', ':"undefined"')
我们需要的信息就在这里面了
infor=jsonData['anyVideo']['gidInformation']['packerData']['video']
dash=infor['videoResource']['dash']
if 'dynamic_video' in dash.keys():
audioUrl=dash['dynamic_video']['dynamic_audio_list'][0]['main_url']
videoUrl=dash['dynamic_video']['dynamic_video_list'][0]['main_url']
else:
print('未获取到源地址')
这里我们直接获取到音频 视频的源地址
不过main_url 还是加密的
audio_url = base64.b64decode(audioUrl).decode("utf-8")
video_url = base64.b64decode(videoUrl).decode("utf-8")
再用base解密一下 就获取到了音频 视频的源地址
完整代码:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time : 2021/2/23 12:18
# @Author : pp
# @Software: PyCharm
import requests
import urllib3
urllib3.disable_warnings()
import re
import json
import base64
cookie='你的cookie'
headers={
"user-agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36",
"cookie":cookie
}
def getRealUrl(url):
response = requests.get(url, verify=False, headers=headers).text
pattern = re.compile('(?<=window._SSR_HYDRATED_DATA=).*?(?=</script>)')
jsonResult = pattern.findall(response)[0]
print(jsonResult)
jsonResult = jsonResult.replace(':undefined', ':"undefined"')
jsonData = json.loads(jsonResult)
print(jsonResult)
infor=jsonData['anyVideo']['gidInformation']['packerData']['video']
dash=infor['videoResource']['dash']
if 'dynamic_video' in dash.keys():
audioUrl=dash['dynamic_video']['dynamic_audio_list'][0]['main_url']
videoUrl=dash['dynamic_video']['dynamic_video_list'][0]['main_url']
else:
print('未获取到源地址')
audio_url = base64.b64decode(audioUrl).decode("utf-8")
video_url = base64.b64decode(videoUrl).decode("utf-8")
return audio_url,video_url
baseUrl='https://www.ixigua.com/6903716672067076612'
audio_url,video_url=getRealUrl(baseUrl)
print(audio_url)
print(video_url)