python采集京东商品详情页面数据,京东API接口,京东h5st签名(2023.08.20)

摘要:

  通过python采集京东商品详情页面数据,京东API接口,京东h5st签名(2023.08.20)

一、原理与分析

1、目标页面

目标页面

在chrome中打开,按f12键进入开发者模式,找到商品详情数据接口,如下:


2、URL链接:

https://api.m.jd.com/?appid=pc-item-soa&functionId=pc_detailpage_wareBusiness&client=pc&clientVersion=1.0.0&t=1692499380806&body=%7B%22skuId%22%3A6515029%2C%22cat%22%3A%221316%2C1381%2C1391%22%2C%22area%22%3A%2225_2258_0_0%22%2C%22shopId%22%3A%221000099941%22%2C%22venderId%22%3A1000099941%2C%22paramJson%22%3A%22%7B%5C%22platform2%5C%22%3A%5C%221%5C%22%2C%5C%22specialAttrStr%5C%22%3A%5C%22p0ppppppppppp1pppppppppppp%5C%22%2C%5C%22skuMarkStr%5C%22%3A%5C%2200%5C%22%7D%22%2C%22num%22%3A1%2C%22bbTraffic%22%3A%22%22%7D&h5st=20230820104308635%3B9m99mz6itng955u3%3Bfb5df%3Btk02w99fb1bc541lMisxd2I5N0tm7s66XeObtysPWoIPlRdJ92-R1cXDBQzPnH5QrNdDMfm18N7zHpJuWML8dwJhOORi%3Bed6048632bdcf647c9a4db5b69b49569%3B4.1%3B1692499388635%3Bee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980acde1a10effcc27fd84ad69b6a255fa2bacbfc5a0cc8222e4ac53b669906820b1461c75971601a3f031b5c1f40b721502f3b79e32d29b726ebec75a213493a818f67211b187fcf51e032e0b772bee8c70e4a1d7502aa775b148a504a31d6272cc6f198b41da73fbe26adfe0d7e3723450ed4c906efbd52e0671d7ab8bd9af7bfc208a38071126c8c70d775962c87b10b611b4f8489070e9d264c47c25dbd35aabe0addff39a3c732105c114056f93a71acfb90156d61b39e11217d5bf21c2e&x-api-eid-token=jdd03GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNAAAAAMKCDJFEVIAAAAAC5FNEJMJ5UGYTMX&loginType=3&uuid=122270672.16893052418291576334291.1689305242.1692440521.1692498368.14

3、标头:

:authority:

api.m.jd.com

:method:

GET

:path:

/?appid=pc-item-soa&functionId=pc_detailpage_wareBusiness&client=pc&clientVersion=1.0.0&t=1692499380806&body=%7B%22skuId%22%3A6515029%2C%22cat%22%3A%221316%2C1381%2C1391%22%2C%22area%22%3A%2225_2258_0_0%22%2C%22shopId%22%3A%221000099941%22%2C%22venderId%22%3A1000099941%2C%22paramJson%22%3A%22%7B%5C%22platform2%5C%22%3A%5C%221%5C%22%2C%5C%22specialAttrStr%5C%22%3A%5C%22p0ppppppppppp1pppppppppppp%5C%22%2C%5C%22skuMarkStr%5C%22%3A%5C%2200%5C%22%7D%22%2C%22num%22%3A1%2C%22bbTraffic%22%3A%22%22%7D&h5st=20230820104308635%3B9m99mz6itng955u3%3Bfb5df%3Btk02w99fb1bc541lMisxd2I5N0tm7s66XeObtysPWoIPlRdJ92-R1cXDBQzPnH5QrNdDMfm18N7zHpJuWML8dwJhOORi%3Bed6048632bdcf647c9a4db5b69b49569%3B4.1%3B1692499388635%3Bee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980acde1a10effcc27fd84ad69b6a255fa2bacbfc5a0cc8222e4ac53b669906820b1461c75971601a3f031b5c1f40b721502f3b79e32d29b726ebec75a213493a818f67211b187fcf51e032e0b772bee8c70e4a1d7502aa775b148a504a31d6272cc6f198b41da73fbe26adfe0d7e3723450ed4c906efbd52e0671d7ab8bd9af7bfc208a38071126c8c70d775962c87b10b611b4f8489070e9d264c47c25dbd35aabe0addff39a3c732105c114056f93a71acfb90156d61b39e11217d5bf21c2e&x-api-eid-token=jdd03GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNAAAAAMKCDJFEVIAAAAAC5FNEJMJ5UGYTMX&loginType=3&uuid=122270672.16893052418291576334291.1689305242.1692440521.1692498368.14

:scheme:

https

Accept:

application/json, text/javascript, */*; q=0.01

Accept-Encoding:

gzip, deflate, br

Accept-Language:

zh-CN,zh;q=0.9

Cookie:

shshshfpa=cb3af5e3-c2cf-dae5-48e3-c2331a38092a-1653253655; shshshfpx=cb3af5e3-c2cf-dae5-48e3-c2331a38092a-1653253655; __jdc=122270672; __jdu=16893052418291576334291; mba_muid=16893052418291576334291; wlfstk_smdl=4qftb0r6lu47t0sx6ovvi37no1pu4y49; 3AB9D23F7A4B3C9B=GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNA; retina=0; appCode=msc588d6d5; webp=1; visitkey=8718662230147716920; sc_width=1536; wxa_level=1; cid=9; jxsid=16924405174098442434; __jdv=122270672%7Cdirect%7C-%7Cnone%7C-%7C1692440521537; equipmentId=GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNA; fingerprint=ba1afe80c24e71237978e1b005ec6a48; deviceVersion=115.0.0.0; deviceOS=; deviceOSVersion=; deviceName=Chrome; warehistory="10072773656365,10072773656365,10072773656365,10072773656365,"; autoOpenApp_downCloseDate_autoOpenApp_autoPromptly=1692441025259_1; __wga=1692441027033.1692440547180.1691914712301.1691914712301.4.2; PPRD_P=UUID.16893052418291576334291-LOGID.1692441027044.644926152; __jd_ref_cls=MProductdetail_CouponFloorExpo; jsavif=1; __jda=122270672.16893052418291576334291.1689305242.1692440521.1692498368.14; token=a4d78cd04f402b3f7ad6a29e8af8aa6f,2,940277; __tk=krazkYhsAcgzjrhtAuewjueDjufpArg5BVoz4zttAzG,2,940277; 3AB9D23F7A4B3CSS=jdd03GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNAAAAAMKCDJFEVIAAAAAC5FNEJMJ5UGYTMX; _gia_d=1; __jdb=122270672.2.16893052418291576334291|14.1692498368; shshshfpb=xbVnfPmoZnca-0u5O8YJzHQ; areaId=25; ipLoc-djd=25-2258-0-0

Origin:

https://item.jd.com

Referer:

https://item.jd.com/

Sec-Ch-Ua:

"Not/A)Brand";v="99", "Google Chrome";v="115", "Chromium";v="115"

Sec-Ch-Ua-Mobile:

?0

Sec-Ch-Ua-Platform:

"Windows"

Sec-Fetch-Dest:

empty

Sec-Fetch-Mode:

cors

Sec-Fetch-Site:

same-site

User-Agent:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36

X-Referer-Page:

https://item.jd.com/6515029.html

X-Rp-Client:

h5_1.0.0

4、接口返回数据:

其中包括:商品图片地址,商品价格,标题,等信息,正是我们所需要的。

(数据量太大,截了一小部分)

{

    "extendWarrantyInfo": {

        "descUrl": "https://baozhang.jd.com/static/serviceDesc",

        "detailUrl": "https://b.jr.jd.com/service/serveIntroduce/#/introduce3?mainSkuId={mainSkuId}&brandId={brandId}&thirdCategoryId={cid3}&bindSkuId={bindSku}",

        "serviceItems": [

            {

(商品详情数据)

5、数据分析

(1)body参数

经过分析发现,URL里body包含请求参数详情,body经过了url编码,解码后如下:

{"skuId":6515029,"cat":"1316,1381,1391","area":"25_2258_0_0","shopId":"1000099941","venderId":1000099941,"paramJson":"{\"platform2\":\"1\",\"specialAttrStr\":\"p0ppppppppppp1pppppppppppp\",\"skuMarkStr\":\"00\"}","num":1,"bbTraffic":""}

"skuId":6515029为商品编号;"shopId":"1000099941"为店铺编号;其它参数跟浏览器等硬件环境有关,可固定不变。

(2)appid参数

指示接口类别,数据值如下:

appid=pc-item-soa  pc端数据详情;

appid=item-v3        数据版本v3;

(3)functionId参数

指示该接口的功能:

functionId=pc_detailpage_wareBusiness  pc端商品页面详情

functionId=pc_club_productCommentSummaries        pc端评论接口数据

functionId=recDivinerApi                          商品页有关数据

functionId=pctradesoa_getprice              返回价格信息

functionId参数不同,body里面的具体参数也不一样。

(4)x-api-eid-token参数

x-api-eid-token=jdd03GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNAAAAAMKCDJFEVIAAAAAC5FNEJMJ5UGYTMX

经测试,该 参数并不会被服务器校验,因此可忽略,不影响数据采集。

(5)h5st参数(数据签名)

h5st=20230820104308635%3B9m99mz6itng955u3%3Bfb5df%3Btk02w99fb1bc541lMisxd2I5N0tm7s66XeObtysPWoIPlRdJ92-R1cXDBQzPnH5QrNdDMfm18N7zHpJuWML8dwJhOORi%3Bed6048632bdcf647c9a4db5b69b49569%3B4.1%3B1692499388635%3Bee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980acde1a10effcc27fd84ad69b6a255fa2bacbfc5a0cc8222e4ac53b669906820b1461c75971601a3f031b5c1f40b721502f3b79e32d29b726ebec75a213493a818f67211b187fcf51e032e0b772bee8c70e4a1d7502aa775b148a504a31d6272cc6f198b41da73fbe26adfe0d7e3723450ed4c906efbd52e0671d7ab8bd9af7bfc208a38071126c8c70d775962c87b10b611b4f8489070e9d264c47c25dbd35aabe0addff39a3c732105c114056f93a71acfb90156d61b39e11217d5bf21c2e

h5st是京东数据签名参数,每个接口都需要。只有签名正确,服务器才会返回数据。不然就会出现多次请求偶尔返回一次数据的情况。

所以,要想采集到数据,必须得到h5st正确的签名。下面具体分析h5st的签名过程:

二、h5st签名分析

1、查找h5st签名算法的位置

全局搜索:getDataColor,为什么要搜索getDataColor,因为h5st算法就在这个函数的附近。

设下断点,刷新页面,截图如下:


(图2)h5st位置

可以直观的看到具体签名过程如下:

            try {

                var d = JSON.parse(JSON.stringify(r));

                d.body = SHA256(s).toString(),

                window.PSign.sign(d).then(function(e) {

                    r.h5st = encodeURI(e.h5st);

              //......................

              }

签名语句:window.PSign.sign(d);

然后返回: r.h5st = encodeURI(e.h5st);

是一个异步过程。

2、下面具体分析各个签名参数:

(1)body参数

{"skuId":6515029,"cat":"1316,1381,1391","area":"25_2258_2261_6568","shopId":"1000099941","venderId":1000099941,"paramJson":"{\"platform2\":\"1\",\"specialAttrStr\":\"p0ppppppppppp1pppppppppppp\",\"skuMarkStr\":\"00\"}","num":1,"bbTraffic":""}

(2)d参数:

{

    "appid": "pc-item-soa",

    "functionId": "pc_detailpage_wareBusiness",

    "client": "pc",

    "clientVersion": "1.0.0",

    "t": 1692498783586,

    "body": "dddd48059b91f87eb42b080167bd70b5303b3df8c4b71a3967372fcda60cd496"

}

d.body = SHA256(s).toString()

按f11单步跟进,发现SHA256的位置。抠下来:


(SHA256算法)

(3)t参数

t:a

a = (new Date).getTime()

t参数是一个时间戳。

签名参数分析完了,下面寻找h5st签名算法。

3、h5st签名算法

在window.PSign.sign(d)处下断点,按f11键单步进入:


(图3)h5st签名算法

进入h5st签名的js文件后,把该签名文件整个保存下来。该js文件名为:js_security_v3_0.1.4.js

(js_security_v3_0.1.4.js内容)

4、h5st签名返回字符串:

{

    "appid": "pc-item-soa",

    "functionId": "pc_detailpage_wareBusiness",

    "client": "pc",

    "clientVersion": "1.0.0",

    "t": 1692498783586,

    "body": "dddd48059b91f87eb42b080167bd70b5303b3df8c4b71a3967372fcda60cd496",

    "_stk": "appid,body,client,clientVersion,functionId,t",

    "_ste": 1,

    "h5st": "20230820131419818;9m99mz6itng955u3;fb5df;tk03w9d441cbf18nk990HQLMH0ehQyR5j8EBXtSrYlGtY8KzYUkKCoUctg6u1pqtBeAqYw-t1yFcromGuN17RlgILtyk;65001318ffed0d17ee21652afb01a996;4.1;1692508459818;ee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980acde1a10effcc27fd84ad69b6a255fa2bacbfc5a0cc8222e4ac53b669906820b1461c75971601a3f031b5c1f40b721502f3b79e32d29b726ebec75a213493a818f67211b187fcf51e032e0b772bee8c70e4a1d7502aa775b148a504a31d627d6db4fde5974622b566cdace3d88a8999574369ad4a27c752e256a8a6d92a5fdfa8633dae1aa5d17f9ea6a859ed6b22c920d7881227b2f7f61f3bbf82c17afd340c42be154e8e3ad1d39c2d8ba94acb84c25299080b5545acc894168647303ed"

}

其中的h5st字段是我们所需要的。

三、在python等其它语言中调用签名接口

js_security_v3_0.1.4.js是具体的签名文件,但还不能在python中直接调用,会报缺少window的错误,因此需要补环境。

技术支持:复制:byc6352

下面的python代码是调用签名及请求接口(环境已补):

# -*- coding: UTF-8 -*-

import requests,json

import pkgutil

import time

from urllib.parse import urlparse, parse_qs, urlunparse

import hashlib

import execjs

from urllib.parse import quote

import io

import sys

def savetofile(text,filename):

    file = open(filename, "w",encoding='utf-8' )

    file.write(text)

    file.close()

def print_hi(name):

    # Use a breakpoint in the code line below to debug your script.

    print(f'Hi, {name}')  # Press Ctrl+F8 to toggle the breakpoint.

def jd(skuid):

    appid='item-v3'

    functionId='recDivinerApi'

    body={"lid":27,"lim":15,"ec":"utf-8","uuid":"16900368971511636315768","pin":"","p":902029,"sku":skuid,"ck":"pin,ipLocation,atw,aview","c1":1316,"c2":1387,"c3":11932,"securityToken":"iJJJBrR7BAxWWavOluQxmMQ","clientChannel":"3","clientPageId":"item.jd.com"}

    js_file = open("h5st.js", "r", encoding='utf-8')

    js=js_file.read()

    exc = execjs.compile(js)

    url= exc.call("sign", appid,functionId,body)

    print('url='+url)

    headers={

        "Authority": "api.m.jd.com",

        "Accept": "application / json, text / javascript, * / *; q = 0.01",

        "Accept - Encoding": "gzip, deflate, br",

        "Accept - Language": "zh - CN, zh;q = 0.9",

        "Cookie": "shshshfpb=i0ZU6VlHi9tt1RukWDDyR0w; 3AB9D23F7A4B3C9B=GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNA; shshshfpa=cb3af5e3-c2cf-dae5-48e3-c2331a38092a-1653253655; shshshfpx=cb3af5e3-c2cf-dae5-48e3-c2331a38092a-1653253655; __jdc=122270672; __jdv=122270672|direct|-|none|-|1689305241830; __jdu=16893052418291576334291; areaId=25; ipLoc-djd=25-2258-2261-6568; token=7a3a5010c8ea7250057d9168270daacd,2,939221; __tk=be32047e11adf495830ad564f7c34cd6,2,939221; 3AB9D23F7A4B3CSS=jdd03GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTRiDY9CRQSU93J9SUTiPmFy3PTP7N8itsNd7DLuiPzfoEjAAACXCBKUWUQMP7FMX; _gia_d=1; jsavif=1; __jda=122270672.16893052418291576334291.1689305242.1690550636.1690599310.7; __jdb=122270672.1.16893052418291576334291|7.1690599310",

        "Origin": "https://item.jd.com",

        "Referer": "https://item.jd.com/",

        "Sec-Ch-Ua": "\"Not.A/Brand\";v=\"8\", \"Chromium\";v=\"114\", \"Google Chrome\";v=\"114\"",

        "Sec-Ch-Ua-Mobile":"?0",

        "Sec-Ch-Ua-Platform":"\"Windows\"",

        "Sec-Fetch-Dest": "empty",

        "Sec-Fetch-Mode": "cors",

        "Sec-Fetch-Site": "same-site",

        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36",

        "X-Referer-Page": f"https://item.jd.com/{skuid}.html",

        "X-Rp-Client": "h5_1.0.0",

    }

    res=requests.get(url=url, headers=headers)

    print(res)

    text=res.text

    savetofile(text,"sku.txt")

    print(text)

    return text

# Press the green button in the gutter to run the script.

if __name__ == '__main__':

    print_hi('最新4.1版本h5st签名返回商品详情。技术支持:byc6352')

    jd(100019322424)

四、在python中成功返回商品详情信息


大功造成!

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,258评论 6 498
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,335评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 162,225评论 0 353
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,126评论 1 292
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,140评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,098评论 1 295
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,018评论 3 417
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,857评论 0 273
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,298评论 1 310
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,518评论 2 332
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,678评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,400评论 5 343
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,993评论 3 325
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,638评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,801评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,661评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,558评论 2 352

推荐阅读更多精彩内容