背景
python3.8
window10
Chrome
Charles
目标
- 站点:https://weixin.sogou.com/
- 获取微信公众号文章内容
过程
搜狗微信列表
- 获取详情页链接
href="/link?url=dn9a_-gY295K0Rci_xozVXfdMkSQTLW6cwJThYulHEtVjXrGTiVgS05vDMvAw8toT_Tn2j42MFRVGL8s5vDf41qXa8Fplpd9AqOK-jhx_qZB8By7fitCFPL-rOAZcbLVyiX2MqCvgw4ptMLwH26p9dm4YaL6ZYtS4NeqHS5Q_nTllRIHVAW4nkfffpKeIKEhRBeQ7G0thG5XpEcc4HfRvLd1KUjSkao4yxsOMKOcnvZDSDZWVslU657CZyNbec73WkNT_QICA71r1dbZWbKUeQ..&type=2&query=%E6%B5%8E%E5%8D%97%E5%A4%A9%E6%B0%94&token=A971863740828180BABF63558B7CED06BA05DE98622EA7A3"
拼接出完整链接
url = news.css('div.txt-box > h3 > a::attr(href)').extract_first()
url = urllib.parse.urljoin(response.url, url)
搜狗微信详情
- 详情页链接
https://weixin.sogou.com/link?url=dn9a_-gY295K0Rci_xozVXfdMkSQTLW6cwJThYulHEtVjXrGTiVgS05vDMvAw8toT_Tn2j42MFRVGL8s5vDf41qXa8Fplpd9AqOK-jhx_qZB8By7fitCFPL-rOAZcbLVyiX2MqCvgw4ptMLwH26p9dm4YaL6ZYtS4NeqHS5Q_nTllRIHVAW4nkfffpKeIKEhRBeQ7G0thG5XpEcc4HfRvLd1KUjSkao4yxsOMKOcnvZDSDZWVslU657CZyNbec73WkNT_QICA71r1dbZWbKUeQ..&type=2&query=%E6%B5%8E%E5%8D%97%E5%A4%A9%E6%B0%94&token=A971863740828180BABF63558B7CED06BA05DE98622EA7A3
-
浏览器直接访问详情页链接(清空cookie)
-
postman方式
浏览器刷新列表页(set-cookie,缓存,各种参数产生)
-
浏览器直接访问详情页
小结:
- 访问详情需要带参数
- 网页跳转(301)
问题展示
- 对IP有限制
- 列表页访问速度太快会被封(已验证)
- 当日访问总量(未验证)
- 如何获取跳转后的真实链接
- 微信链接会失效
解决过程
IP的问题
代理IP
获取搜狗微信真实链接
- 发生了一次301重定向
curl -H "Host: mp.weixin.qq.com" -H "Upgrade-Insecure-Requests: 1" -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9" -H "Referer: https://weixin.sogou.com/link?url=dn9a_-gY295K0Rci_xozVXfdMkSQTLW6cwJThYulHEtVjXrGTiVgS05vDMvAw8toa5ZWgFUPIu5VGL8s5vDf41qXa8Fplpd9AqOK-jhx_qZB8By7fitCFPL-rOAZcbLVyiX2MqCvgw4ptMLwH26p9dm4YaL6ZYtS4NeqHS5Q_nTllRIHVAW4nrFzx6-3M-9Ud3TUgO8nEfttSG3lUTs25OkkcsTUz87GouPLeK06uSd2rPlJlESjn-37p7qSMapfYNakzj9lgtwYJSSFPgfogQ..&type=2&query=%E6%B5%8E%E5%8D%97%E5%A4%A9%E6%B0%94&token=A9871CD7955756556E6BB54BA8C36F206F25D310622EAE02&k=48&h=F" -H "Accept-Language: zh-CN,zh;q=0.9" -H "Cookie: rewardsn=; wxtokenkey=777" --compressed "http://mp.weixin.qq.com/s?src=11×tamp=1647226370&ver=3675&signature=ykBEMHUd-2F9qhxMiD*XJI9QZi4qDEaneJX2DGYg0tibi8Jn*QtbK0-MYOhyn9AC7Jmpw-gN83DXTNFBpU37jcgNnolYWG2ZSRoKGX7hdMQC387Oj9sGtV5We9q6d59*&new=1"
请求的链接
http://mp.weixin.qq.com/s?src=11×tamp=1647226370&ver=3675&signature=ykBEMHUd-2F9qhxMiD*XJI9QZi4qDEaneJX2DGYg0tibi8Jn*QtbK0-MYOhyn9AC7Jmpw-gN83DXTNFBpU37jcgNnolYWG2ZSRoKGX7hdMQC387Oj9sGtV5We9q6d59*&new=1
响应内容
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html;charset=utf8">
<meta id="viewport" name="viewport" content="width=device-width,initial-scale=1.0,maximum-scale=1.0,user-scalable=0" />
<title>未知错误</title>
<style>
html,body{
height:100%;
padding:0px;
margin:0px;
}
body{
background-color: #F4F4F4;
}
.panel {
padding: 18px 22px 10px;
}
.mesg-block{
margin-bottom:20px;
}
.mesg-block p{
font-size: 16px;
line-height: 1.3em;
color: #858585;
text-shadow: 0px 1px 0px #FFF;
text-align:center;
}
</style>
</head>
<body>
<div class="panel">
<div class="mesg-block">
<p>未知错误,请稍后再试</p>
</div>
</div>
<script>
(function(){
document.addEventListener('WeixinJSBridgeReady', function onBridgeReady() {
var appId = '',
imgUrl = ''
link = 'http://mp.weixin.qq.com/mp/conference/default/share',
title = '失效的验证页面'
desc = '你暂无权限查看此页面内容。',
content = '#微信分享#,你暂无权限查看此页面内容。';
WeixinJSBridge.on('menu:share:appmessage', function(argv){
WeixinJSBridge.invoke('sendAppMessage',{
"appid":appId,
"img_url":imgUrl,
"img_width":"640",
"img_height":"640",
"link":link,
"desc":desc,
"title":title
}, function(res) {})
});
WeixinJSBridge.on('menu:share:timeline', function(argv){
WeixinJSBridge.invoke('shareTimeline',{
"img_url":imgUrl,
"img_width":"640",
"img_height":"640",
"link":link,
"desc": desc,
"title":title
}, function(res) {
});
});
var weiboContent = '';
WeixinJSBridge.on('menu:share:weibo', function(argv){
WeixinJSBridge.invoke('shareWeibo',{
"content":content,
"url":link,
}, function(res) {
});
});
WeixinJSBridge.call('hideOptionMenu');
}, false);
})();
</script>
</body>
</head>
</html>
可以发现这个不是真正的链接
- 跳转后的链接可以通过response headers 中的 Location获取
Location: https://mp.weixin.qq.com/s?src=11×tamp=1647226370&ver=3675&signature=ykBEMHUd-2F9qhxMiD*XJI9QZi4qDEaneJX2DGYg0tibi8Jn*QtbK0-MYOhyn9AC7Jmpw-gN83DXTNFBpU37jcgNnolYWG2ZSRoKGX7hdMQC387Oj9sGtV5We9q6d59*&new=1
-
真实有效请求
在跳转后又发生一次请求,这个就是真正的链接
请求地址
https://mp.weixin.qq.com/s?src=11×tamp=1647226370&ver=3675&signature=ykBEMHUd-2F9qhxMiD*XJI9QZi4qDEaneJX2DGYg0tibi8Jn*QtbK0-MYOhyn9AC7Jmpw-gN83DXTNFBpU37jcgNnolYWG2ZSRoKGX7hdMQC387Oj9sGtV5We9q6d59*&new=1
分析:
- (真实有效请求)的链接地址是从(重定向请求)中的Location中的获取的
- 因此获得到(301重定向)的请求链接就可以获得真正的地址
继续
-
我们尝试直接搜索一下真实链接中的一部分(各种试。。。)
真的找到一条可疑请求请求(我们称之为:拼接请求)
响应内容
<meta content="always" name="referrer">
<script>
(new Image()).src = 'https://weixin.sogou.com/approve?uuid=' + '7bc50a8b-2449-43a3-b852-ae83b83ee01c' + '&token=' + 'A9874F20955756556E6BB54BA8C36F206F25D310622EAE11' + '&from=inner';
setTimeout(function () {
var url = '';
url += 'http://mp.w';
url += 'eixin.qq.co';
url += 'm/s?src=11&';
url += 'timestamp=1';
url += '647226370&v';
url += 'er=3675&sig';
url += 'nature=ykBE';
url += 'MHUd-2F9qhx';
url += 'MiD*XJI9QZi';
url += '4qDEaneJX2D';
url += 'GYg0tibi8Jn*QtbK0-MYOhyn9AC7Jmpw-gN83DXTNFBpU37jcgNnolYWG2ZSRoKGX7hdMQC387Oj9sGtV5We9q6d59*&new=1';
url.replace("@", "");
window.location.replace(url)
},100);
</script>
curl -H "Host: weixin.sogou.com" -H "sec-ch-ua: \" Not A;Brand\";v=\"99\", \"Chromium\";v=\"98\", \"Google Chrome\";v=\"98\"" -H "sec-ch-ua-mobile: ?0" -H "sec-ch-ua-platform: \"Windows\"" -H "Upgrade-Insecure-Requests: 1" -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9" -H "Sec-Fetch-Site: same-origin" -H "Sec-Fetch-Mode: navigate" -H "Sec-Fetch-User: ?1" -H "Sec-Fetch-Dest: document" -H "Referer: https://weixin.sogou.com/weixin?type=2&query=%E6%B5%8E%E5%8D%97%E5%A4%A9%E6%B0%94&ie=utf8&s_from=input&_sug_=n&_sug_type_=1&w=01015002&oq=&ri=6&sourceid=sugg&sut=0&sst0=1647224739319&lkt=0%2C0%2C0&p=40040108" -H "Accept-Language: zh-CN,zh;q=0.9" -H "Cookie: ABTEST=0|1647226370|v1; SNUID=955756556E6BB54BA8C36F206F25D310; IPLOC=CN3701; SUID=FA38383A1B0DA00A00000000622EAE02; SUID=FA38383A6555A00A00000000622EAE02; JSESSIONID=aaaPefwhkqDB384EsWe9x; SUV=00B5600B3A3838FA622EAE0242604246; ariaDefaultTheme=undefined" --compressed "https://weixin.sogou.com/link?url=dn9a_-gY295K0Rci_xozVXfdMkSQTLW6cwJThYulHEtVjXrGTiVgS05vDMvAw8toa5ZWgFUPIu5VGL8s5vDf41qXa8Fplpd9AqOK-jhx_qZB8By7fitCFPL-rOAZcbLVyiX2MqCvgw4ptMLwH26p9dm4YaL6ZYtS4NeqHS5Q_nTllRIHVAW4nrFzx6-3M-9Ud3TUgO8nEfttSG3lUTs25OkkcsTUz87GouPLeK06uSd2rPlJlESjn-37p7qSMapfYNakzj9lgtwYJSSFPgfogQ..&type=2&query=%E6%B5%8E%E5%8D%97%E5%A4%A9%E6%B0%94&token=A9871CD7955756556E6BB54BA8C36F206F25D310622EAE02&k=48&h=F"
- 可看出在此处(拼接请求)直接拼接出(301)跳转请求的链接
- 如果可以完成这个请求就大功告成了
分析
- 拼接请求的请求地址
https://weixin.sogou.com/link?url=dn9a_-gY295K0Rci_xozVXfdMkSQTLW6cwJThYulHEtVjXrGTiVgS05vDMvAw8toa5ZWgFUPIu5VGL8s5vDf41qXa8Fplpd9AqOK-jhx_qZB8By7fitCFPL-rOAZcbLVyiX2MqCvgw4ptMLwH26p9dm4YaL6ZYtS4NeqHS5Q_nTllRIHVAW4nrFzx6-3M-9Ud3TUgO8nEfttSG3lUTs25OkkcsTUz87GouPLeK06uSd2rPlJlESjn-37p7qSMapfYNakzj9lgtwYJSSFPgfogQ..&type=2&query=%E6%B5%8E%E5%8D%97%E5%A4%A9%E6%B0%94&token=A9871CD7955756556E6BB54BA8C36F206F25D310622EAE02&k=48&h=F
- 和我们在搜狗列表页获取的链接很相似,对比下
https://weixin.sogou.com/link?url=dn9a_-gY295K0Rci_xozVXfdMkSQTLW6cwJThYulHEtVjXrGTiVgS05vDMvAw8toa5ZWgFUPIu5VGL8s5vDf41qXa8Fplpd9AqOK-jhx_qZB8By7fitCFPL-rOAZcbLVyiX2MqCvgw4ptMLwH26p9dm4YaL6ZYtS4NeqHS5Q_nTllRIHVAW4nrFzx6-3M-9Ud3TUgO8nEfttSG3lUTs25OkkcsTUz87GouPLeK06uSd2rPlJlESjn-37p7qSMapfYNakzj9lgtwYJSSFPgfogQ..&type=2&query=%E6%B5%8E%E5%8D%97%E5%A4%A9%E6%B0%94&token=A9871CD7955756556E6BB54BA8C36F206F25D310622EAE02&k=48&h=F
https://weixin.sogou.com/link?url=dn9a_-gY295K0Rci_xozVXfdMkSQTLW6cwJThYulHEtVjXrGTiVgS05vDMvAw8toT_Tn2j42MFRVGL8s5vDf41qXa8Fplpd9AqOK-jhx_qZB8By7fitCFPL-rOAZcbLVyiX2MqCvgw4ptMLwH26p9dm4YaL6ZYtS4NeqHS5Q_nTllRIHVAW4nkfffpKeIKEhRBeQ7G0thG5XpEcc4HfRvLd1KUjSkao4yxsOMKOcnvZDSDZWVslU657CZyNbec73WkNT_QICA71r1dbZWbKUeQ..&type=2&query=%E6%B5%8E%E5%8D%97%E5%A4%A9%E6%B0%94&token=A971863740828180BABF63558B7CED06BA05DE98622EA7A3
- 拼接请求多了参数:&k=48&h=F
- 找到这个参数如何生成的就大功告成
继续
-
&k=48&h=F这个参数到底在哪呢?那就再搜索一下试试吧
-
抱歉,并没有找到有效信息,在Charles中,那就浏览器里试试呗
-
也没有搜到,那就缩短一下词,搜k=,h=,&k,&h,
这段JS看起来很可疑
<script>
(function() {
$("a").on("mousedown click contextmenu", function() {
var b = Math.floor(100 * Math.random()) + 1
, a = this.href.indexOf("url=")
, c = this.href.indexOf("&k=");
-1 !== a && -1 === c && (a = this.href.substr(a + 4 + parseInt("21") + b, 1),
this.href += "&k=" + b + "&h=" + a)
})
}
)();
</script>
- 看不懂,猜一下,说的是:点击什么后,链接里添加&k=,&h=
- 这个点什么,我又猜是指的,列表页的链接,
-
去看看列表页列表变了吗
- 真的变了(对比的是 本文:过程>搜狗微信列表 > 图片 )
- 就是这段JS了
- 翻译成python
def get_k_h(self, url):
b = int(random.random() * 100) + 1
a = url.find("url=")
url = url + "&k=" + str(b) + "&h=" + url[a + 4 + 21 + b: a + 4 + 21 + b + 1]
return url
-
模拟拼接请求
- 获取真实链接,组装和拼接
def get_real_url(self, content):
# 获取真实url
url_text = re.findall("= \'(\S+?)\';", content, re.S)
best_url = ''.join(url_text)
return best_url
- 到此,我们就获得了真实的链接
微信链接会失效
- 临时链接转永久链接的第三方
- 之前用过神箭手,现在已经不在了
- 其他平台提供的收费接口,不提供名称了,避免广告嫌疑
- 微信公众平台获取永久链接
写在最后
- 如果觉得我写的不错的话,点个赞鼓励一下
- 如果没看懂,可以咨询我
- 如果需要帮忙,可以咨询我
- 如有错误,请指正
- 如有更好的办法,请指教