搜狗微信文章真实链接

背景

python3.8
window10
Chrome
Charles

目标

过程

搜狗微信列表

image.png
  • 获取详情页链接
href="/link?url=dn9a_-gY295K0Rci_xozVXfdMkSQTLW6cwJThYulHEtVjXrGTiVgS05vDMvAw8toT_Tn2j42MFRVGL8s5vDf41qXa8Fplpd9AqOK-jhx_qZB8By7fitCFPL-rOAZcbLVyiX2MqCvgw4ptMLwH26p9dm4YaL6ZYtS4NeqHS5Q_nTllRIHVAW4nkfffpKeIKEhRBeQ7G0thG5XpEcc4HfRvLd1KUjSkao4yxsOMKOcnvZDSDZWVslU657CZyNbec73WkNT_QICA71r1dbZWbKUeQ..&type=2&query=%E6%B5%8E%E5%8D%97%E5%A4%A9%E6%B0%94&token=A971863740828180BABF63558B7CED06BA05DE98622EA7A3"

拼接出完整链接

url = news.css('div.txt-box > h3 > a::attr(href)').extract_first()
url = urllib.parse.urljoin(response.url, url)

搜狗微信详情

  • 详情页链接
https://weixin.sogou.com/link?url=dn9a_-gY295K0Rci_xozVXfdMkSQTLW6cwJThYulHEtVjXrGTiVgS05vDMvAw8toT_Tn2j42MFRVGL8s5vDf41qXa8Fplpd9AqOK-jhx_qZB8By7fitCFPL-rOAZcbLVyiX2MqCvgw4ptMLwH26p9dm4YaL6ZYtS4NeqHS5Q_nTllRIHVAW4nkfffpKeIKEhRBeQ7G0thG5XpEcc4HfRvLd1KUjSkao4yxsOMKOcnvZDSDZWVslU657CZyNbec73WkNT_QICA71r1dbZWbKUeQ..&type=2&query=%E6%B5%8E%E5%8D%97%E5%A4%A9%E6%B0%94&token=A971863740828180BABF63558B7CED06BA05DE98622EA7A3
  • 浏览器直接访问详情页链接(清空cookie)


    image.png
  • postman方式


    image.png
  • 浏览器刷新列表页(set-cookie,缓存,各种参数产生)

  • 浏览器直接访问详情页


    image.png

小结:

  • 访问详情需要带参数
  • 网页跳转(301)

问题展示

  • 对IP有限制
    1. 列表页访问速度太快会被封(已验证)
    2. 当日访问总量(未验证)
  • 如何获取跳转后的真实链接
  • 微信链接会失效

解决过程

IP的问题

代理IP

获取搜狗微信真实链接

image.png
  • 发生了一次301重定向
curl -H "Host: mp.weixin.qq.com" -H "Upgrade-Insecure-Requests: 1" -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9" -H "Referer: https://weixin.sogou.com/link?url=dn9a_-gY295K0Rci_xozVXfdMkSQTLW6cwJThYulHEtVjXrGTiVgS05vDMvAw8toa5ZWgFUPIu5VGL8s5vDf41qXa8Fplpd9AqOK-jhx_qZB8By7fitCFPL-rOAZcbLVyiX2MqCvgw4ptMLwH26p9dm4YaL6ZYtS4NeqHS5Q_nTllRIHVAW4nrFzx6-3M-9Ud3TUgO8nEfttSG3lUTs25OkkcsTUz87GouPLeK06uSd2rPlJlESjn-37p7qSMapfYNakzj9lgtwYJSSFPgfogQ..&type=2&query=%E6%B5%8E%E5%8D%97%E5%A4%A9%E6%B0%94&token=A9871CD7955756556E6BB54BA8C36F206F25D310622EAE02&k=48&h=F" -H "Accept-Language: zh-CN,zh;q=0.9" -H "Cookie: rewardsn=; wxtokenkey=777" --compressed "http://mp.weixin.qq.com/s?src=11&timestamp=1647226370&ver=3675&signature=ykBEMHUd-2F9qhxMiD*XJI9QZi4qDEaneJX2DGYg0tibi8Jn*QtbK0-MYOhyn9AC7Jmpw-gN83DXTNFBpU37jcgNnolYWG2ZSRoKGX7hdMQC387Oj9sGtV5We9q6d59*&new=1"

请求的链接

http://mp.weixin.qq.com/s?src=11&timestamp=1647226370&ver=3675&signature=ykBEMHUd-2F9qhxMiD*XJI9QZi4qDEaneJX2DGYg0tibi8Jn*QtbK0-MYOhyn9AC7Jmpw-gN83DXTNFBpU37jcgNnolYWG2ZSRoKGX7hdMQC387Oj9sGtV5We9q6d59*&new=1

响应内容

<!DOCTYPE html>
<html>
    <head>
        <meta http-equiv="content-type" content="text/html;charset=utf8">
        <meta id="viewport" name="viewport" content="width=device-width,initial-scale=1.0,maximum-scale=1.0,user-scalable=0" />
        <title>未知错误</title>
        <style> 
            html,body{
                height:100%;
                padding:0px;
                margin:0px;
            }
            body{
                background-color: #F4F4F4;
            }
            .panel {
                padding: 18px 22px 10px;    
            }
            .mesg-block{
                margin-bottom:20px;
            }
            .mesg-block p{
                font-size: 16px;
                line-height: 1.3em;
                color: #858585;
                text-shadow: 0px 1px 0px #FFF;
                text-align:center;
            }
        </style>
    </head>
    <body>
        <div class="panel">
            <div class="mesg-block">
                <p>未知错误,请稍后再试</p>
            </div>  
        </div>
        <script>
            (function(){
                
                document.addEventListener('WeixinJSBridgeReady', function onBridgeReady() {
                    var appId = '',
                        imgUrl = ''
                        link = 'http://mp.weixin.qq.com/mp/conference/default/share',
                        title = '失效的验证页面'
                        desc = '你暂无权限查看此页面内容。',
                        content = '#微信分享#,你暂无权限查看此页面内容。';
                    
                    
                    WeixinJSBridge.on('menu:share:appmessage', function(argv){
                        WeixinJSBridge.invoke('sendAppMessage',{
                                              "appid":appId,
                                              "img_url":imgUrl,
                                              "img_width":"640",
                                              "img_height":"640",
                                              "link":link,
                                              "desc":desc,
                                              "title":title
                                              }, function(res) {})
                    });
                    
                    WeixinJSBridge.on('menu:share:timeline', function(argv){
                        WeixinJSBridge.invoke('shareTimeline',{
                                              "img_url":imgUrl,
                                              "img_width":"640",
                                              "img_height":"640",
                                              "link":link,
                                              "desc": desc,
                                              "title":title
                                              }, function(res) {
                                              });
                    });

                    
                    var weiboContent = '';
                    WeixinJSBridge.on('menu:share:weibo', function(argv){
                        WeixinJSBridge.invoke('shareWeibo',{
                                              "content":content,
                                              "url":link,
                                              }, function(res) {
                                              });
                    });
                    
                    WeixinJSBridge.call('hideOptionMenu');
                }, false);
            })();
        </script>
        </body>
    </head>
</html>

可以发现这个不是真正的链接

  • 跳转后的链接可以通过response headers 中的 Location获取
Location: https://mp.weixin.qq.com/s?src=11&timestamp=1647226370&ver=3675&signature=ykBEMHUd-2F9qhxMiD*XJI9QZi4qDEaneJX2DGYg0tibi8Jn*QtbK0-MYOhyn9AC7Jmpw-gN83DXTNFBpU37jcgNnolYWG2ZSRoKGX7hdMQC387Oj9sGtV5We9q6d59*&new=1
image.png
  • 真实有效请求
    在跳转后又发生一次请求,这个就是真正的链接


    image.png

    请求地址

https://mp.weixin.qq.com/s?src=11&timestamp=1647226370&ver=3675&signature=ykBEMHUd-2F9qhxMiD*XJI9QZi4qDEaneJX2DGYg0tibi8Jn*QtbK0-MYOhyn9AC7Jmpw-gN83DXTNFBpU37jcgNnolYWG2ZSRoKGX7hdMQC387Oj9sGtV5We9q6d59*&new=1

分析:

  • (真实有效请求)的链接地址是从(重定向请求)中的Location中的获取的
  • 因此获得到(301重定向)的请求链接就可以获得真正的地址

继续

  • 我们尝试直接搜索一下真实链接中的一部分(各种试。。。)


    image.png
  • 真的找到一条可疑请求请求(我们称之为:拼接请求)
    响应内容


<meta content="always" name="referrer">
<script>

    (new Image()).src = 'https://weixin.sogou.com/approve?uuid=' + '7bc50a8b-2449-43a3-b852-ae83b83ee01c' + '&token=' + 'A9874F20955756556E6BB54BA8C36F206F25D310622EAE11' + '&from=inner';

    setTimeout(function () {
        var url = '';
        url += 'http://mp.w';
        url += 'eixin.qq.co';
        url += 'm/s?src=11&';
        url += 'timestamp=1';
        url += '647226370&v';
        url += 'er=3675&sig';
        url += 'nature=ykBE';
        url += 'MHUd-2F9qhx';
        url += 'MiD*XJI9QZi';
        url += '4qDEaneJX2D';
        url += 'GYg0tibi8Jn*QtbK0-MYOhyn9AC7Jmpw-gN83DXTNFBpU37jcgNnolYWG2ZSRoKGX7hdMQC387Oj9sGtV5We9q6d59*&new=1';
        url.replace("@", "");
        window.location.replace(url)
    },100);

</script>
curl -H "Host: weixin.sogou.com" -H "sec-ch-ua: \" Not A;Brand\";v=\"99\", \"Chromium\";v=\"98\", \"Google Chrome\";v=\"98\"" -H "sec-ch-ua-mobile: ?0" -H "sec-ch-ua-platform: \"Windows\"" -H "Upgrade-Insecure-Requests: 1" -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9" -H "Sec-Fetch-Site: same-origin" -H "Sec-Fetch-Mode: navigate" -H "Sec-Fetch-User: ?1" -H "Sec-Fetch-Dest: document" -H "Referer: https://weixin.sogou.com/weixin?type=2&query=%E6%B5%8E%E5%8D%97%E5%A4%A9%E6%B0%94&ie=utf8&s_from=input&_sug_=n&_sug_type_=1&w=01015002&oq=&ri=6&sourceid=sugg&sut=0&sst0=1647224739319&lkt=0%2C0%2C0&p=40040108" -H "Accept-Language: zh-CN,zh;q=0.9" -H "Cookie: ABTEST=0|1647226370|v1; SNUID=955756556E6BB54BA8C36F206F25D310; IPLOC=CN3701; SUID=FA38383A1B0DA00A00000000622EAE02; SUID=FA38383A6555A00A00000000622EAE02; JSESSIONID=aaaPefwhkqDB384EsWe9x; SUV=00B5600B3A3838FA622EAE0242604246; ariaDefaultTheme=undefined" --compressed "https://weixin.sogou.com/link?url=dn9a_-gY295K0Rci_xozVXfdMkSQTLW6cwJThYulHEtVjXrGTiVgS05vDMvAw8toa5ZWgFUPIu5VGL8s5vDf41qXa8Fplpd9AqOK-jhx_qZB8By7fitCFPL-rOAZcbLVyiX2MqCvgw4ptMLwH26p9dm4YaL6ZYtS4NeqHS5Q_nTllRIHVAW4nrFzx6-3M-9Ud3TUgO8nEfttSG3lUTs25OkkcsTUz87GouPLeK06uSd2rPlJlESjn-37p7qSMapfYNakzj9lgtwYJSSFPgfogQ..&type=2&query=%E6%B5%8E%E5%8D%97%E5%A4%A9%E6%B0%94&token=A9871CD7955756556E6BB54BA8C36F206F25D310622EAE02&k=48&h=F"
  • 可看出在此处(拼接请求)直接拼接出(301)跳转请求的链接
  • 如果可以完成这个请求就大功告成了

分析

  • 拼接请求的请求地址
https://weixin.sogou.com/link?url=dn9a_-gY295K0Rci_xozVXfdMkSQTLW6cwJThYulHEtVjXrGTiVgS05vDMvAw8toa5ZWgFUPIu5VGL8s5vDf41qXa8Fplpd9AqOK-jhx_qZB8By7fitCFPL-rOAZcbLVyiX2MqCvgw4ptMLwH26p9dm4YaL6ZYtS4NeqHS5Q_nTllRIHVAW4nrFzx6-3M-9Ud3TUgO8nEfttSG3lUTs25OkkcsTUz87GouPLeK06uSd2rPlJlESjn-37p7qSMapfYNakzj9lgtwYJSSFPgfogQ..&type=2&query=%E6%B5%8E%E5%8D%97%E5%A4%A9%E6%B0%94&token=A9871CD7955756556E6BB54BA8C36F206F25D310622EAE02&k=48&h=F
  • 和我们在搜狗列表页获取的链接很相似,对比下
https://weixin.sogou.com/link?url=dn9a_-gY295K0Rci_xozVXfdMkSQTLW6cwJThYulHEtVjXrGTiVgS05vDMvAw8toa5ZWgFUPIu5VGL8s5vDf41qXa8Fplpd9AqOK-jhx_qZB8By7fitCFPL-rOAZcbLVyiX2MqCvgw4ptMLwH26p9dm4YaL6ZYtS4NeqHS5Q_nTllRIHVAW4nrFzx6-3M-9Ud3TUgO8nEfttSG3lUTs25OkkcsTUz87GouPLeK06uSd2rPlJlESjn-37p7qSMapfYNakzj9lgtwYJSSFPgfogQ..&type=2&query=%E6%B5%8E%E5%8D%97%E5%A4%A9%E6%B0%94&token=A9871CD7955756556E6BB54BA8C36F206F25D310622EAE02&k=48&h=F
https://weixin.sogou.com/link?url=dn9a_-gY295K0Rci_xozVXfdMkSQTLW6cwJThYulHEtVjXrGTiVgS05vDMvAw8toT_Tn2j42MFRVGL8s5vDf41qXa8Fplpd9AqOK-jhx_qZB8By7fitCFPL-rOAZcbLVyiX2MqCvgw4ptMLwH26p9dm4YaL6ZYtS4NeqHS5Q_nTllRIHVAW4nkfffpKeIKEhRBeQ7G0thG5XpEcc4HfRvLd1KUjSkao4yxsOMKOcnvZDSDZWVslU657CZyNbec73WkNT_QICA71r1dbZWbKUeQ..&type=2&query=%E6%B5%8E%E5%8D%97%E5%A4%A9%E6%B0%94&token=A971863740828180BABF63558B7CED06BA05DE98622EA7A3
  • 拼接请求多了参数:&k=48&h=F
  • 找到这个参数如何生成的就大功告成

继续

  • &k=48&h=F这个参数到底在哪呢?那就再搜索一下试试吧


    image.png
  • 抱歉,并没有找到有效信息,在Charles中,那就浏览器里试试呗


    image.png
  • 也没有搜到,那就缩短一下词,搜k=,h=,&k,&h,


    image.png
  • 这段JS看起来很可疑

<script>
            (function() {
                $("a").on("mousedown click contextmenu", function() {
                    var b = Math.floor(100 * Math.random()) + 1
                      , a = this.href.indexOf("url=")
                      , c = this.href.indexOf("&k=");
                    -1 !== a && -1 === c && (a = this.href.substr(a + 4 + parseInt("21") + b, 1),
                    this.href += "&k=" + b + "&h=" + a)
                })
            }
            )();
        </script>
  • 看不懂,猜一下,说的是:点击什么后,链接里添加&k=,&h=
  • 这个点什么,我又猜是指的,列表页的链接,
  • 去看看列表页列表变了吗


    image.png
  • 真的变了(对比的是 本文:过程>搜狗微信列表 > 图片 )
  • 就是这段JS了
  • 翻译成python
    def get_k_h(self, url):
        b = int(random.random() * 100) + 1
        a = url.find("url=")
        url = url + "&k=" + str(b) + "&h=" + url[a + 4 + 21 + b: a + 4 + 21 + b + 1]
        return url
  • 模拟拼接请求


    image.png
  • 获取真实链接,组装和拼接
    def get_real_url(self, content):
        #  获取真实url
        url_text = re.findall("= \'(\S+?)\';", content, re.S)
        best_url = ''.join(url_text)
        return best_url
  • 到此,我们就获得了真实的链接

微信链接会失效

  • 临时链接转永久链接的第三方
    1. 之前用过神箭手,现在已经不在了
    2. 其他平台提供的收费接口,不提供名称了,避免广告嫌疑
  • 微信公众平台获取永久链接

写在最后

  • 如果觉得我写的不错的话,点个赞鼓励一下
  • 如果没看懂,可以咨询我
  • 如果需要帮忙,可以咨询我
  • 如有错误,请指正
  • 如有更好的办法,请指教
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容