摘要:
通过python采集京东商品详情页面数据,京东API接口,京东h5st签名(2023.08.20)
一、原理与分析
1、目标页面
在chrome中打开,按f12键进入开发者模式,找到商品详情数据接口,如下:
2、URL链接:
https://api.m.jd.com/?appid=pc-item-soa&functionId=pc_detailpage_wareBusiness&client=pc&clientVersion=1.0.0&t=1692499380806&body=%7B%22skuId%22%3A6515029%2C%22cat%22%3A%221316%2C1381%2C1391%22%2C%22area%22%3A%2225_2258_0_0%22%2C%22shopId%22%3A%221000099941%22%2C%22venderId%22%3A1000099941%2C%22paramJson%22%3A%22%7B%5C%22platform2%5C%22%3A%5C%221%5C%22%2C%5C%22specialAttrStr%5C%22%3A%5C%22p0ppppppppppp1pppppppppppp%5C%22%2C%5C%22skuMarkStr%5C%22%3A%5C%2200%5C%22%7D%22%2C%22num%22%3A1%2C%22bbTraffic%22%3A%22%22%7D&h5st=20230820104308635%3B9m99mz6itng955u3%3Bfb5df%3Btk02w99fb1bc541lMisxd2I5N0tm7s66XeObtysPWoIPlRdJ92-R1cXDBQzPnH5QrNdDMfm18N7zHpJuWML8dwJhOORi%3Bed6048632bdcf647c9a4db5b69b49569%3B4.1%3B1692499388635%3Bee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980acde1a10effcc27fd84ad69b6a255fa2bacbfc5a0cc8222e4ac53b669906820b1461c75971601a3f031b5c1f40b721502f3b79e32d29b726ebec75a213493a818f67211b187fcf51e032e0b772bee8c70e4a1d7502aa775b148a504a31d6272cc6f198b41da73fbe26adfe0d7e3723450ed4c906efbd52e0671d7ab8bd9af7bfc208a38071126c8c70d775962c87b10b611b4f8489070e9d264c47c25dbd35aabe0addff39a3c732105c114056f93a71acfb90156d61b39e11217d5bf21c2e&x-api-eid-token=jdd03GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNAAAAAMKCDJFEVIAAAAAC5FNEJMJ5UGYTMX&loginType=3&uuid=122270672.16893052418291576334291.1689305242.1692440521.1692498368.14
3、标头:
:authority:
api.m.jd.com
:method:
GET
:path:
/?appid=pc-item-soa&functionId=pc_detailpage_wareBusiness&client=pc&clientVersion=1.0.0&t=1692499380806&body=%7B%22skuId%22%3A6515029%2C%22cat%22%3A%221316%2C1381%2C1391%22%2C%22area%22%3A%2225_2258_0_0%22%2C%22shopId%22%3A%221000099941%22%2C%22venderId%22%3A1000099941%2C%22paramJson%22%3A%22%7B%5C%22platform2%5C%22%3A%5C%221%5C%22%2C%5C%22specialAttrStr%5C%22%3A%5C%22p0ppppppppppp1pppppppppppp%5C%22%2C%5C%22skuMarkStr%5C%22%3A%5C%2200%5C%22%7D%22%2C%22num%22%3A1%2C%22bbTraffic%22%3A%22%22%7D&h5st=20230820104308635%3B9m99mz6itng955u3%3Bfb5df%3Btk02w99fb1bc541lMisxd2I5N0tm7s66XeObtysPWoIPlRdJ92-R1cXDBQzPnH5QrNdDMfm18N7zHpJuWML8dwJhOORi%3Bed6048632bdcf647c9a4db5b69b49569%3B4.1%3B1692499388635%3Bee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980acde1a10effcc27fd84ad69b6a255fa2bacbfc5a0cc8222e4ac53b669906820b1461c75971601a3f031b5c1f40b721502f3b79e32d29b726ebec75a213493a818f67211b187fcf51e032e0b772bee8c70e4a1d7502aa775b148a504a31d6272cc6f198b41da73fbe26adfe0d7e3723450ed4c906efbd52e0671d7ab8bd9af7bfc208a38071126c8c70d775962c87b10b611b4f8489070e9d264c47c25dbd35aabe0addff39a3c732105c114056f93a71acfb90156d61b39e11217d5bf21c2e&x-api-eid-token=jdd03GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNAAAAAMKCDJFEVIAAAAAC5FNEJMJ5UGYTMX&loginType=3&uuid=122270672.16893052418291576334291.1689305242.1692440521.1692498368.14
:scheme:
https
Accept:
application/json, text/javascript, */*; q=0.01
Accept-Encoding:
gzip, deflate, br
Accept-Language:
zh-CN,zh;q=0.9
Cookie:
shshshfpa=cb3af5e3-c2cf-dae5-48e3-c2331a38092a-1653253655; shshshfpx=cb3af5e3-c2cf-dae5-48e3-c2331a38092a-1653253655; __jdc=122270672; __jdu=16893052418291576334291; mba_muid=16893052418291576334291; wlfstk_smdl=4qftb0r6lu47t0sx6ovvi37no1pu4y49; 3AB9D23F7A4B3C9B=GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNA; retina=0; appCode=msc588d6d5; webp=1; visitkey=8718662230147716920; sc_width=1536; wxa_level=1; cid=9; jxsid=16924405174098442434; __jdv=122270672%7Cdirect%7C-%7Cnone%7C-%7C1692440521537; equipmentId=GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNA; fingerprint=ba1afe80c24e71237978e1b005ec6a48; deviceVersion=115.0.0.0; deviceOS=; deviceOSVersion=; deviceName=Chrome; warehistory="10072773656365,10072773656365,10072773656365,10072773656365,"; autoOpenApp_downCloseDate_autoOpenApp_autoPromptly=1692441025259_1; __wga=1692441027033.1692440547180.1691914712301.1691914712301.4.2; PPRD_P=UUID.16893052418291576334291-LOGID.1692441027044.644926152; __jd_ref_cls=MProductdetail_CouponFloorExpo; jsavif=1; __jda=122270672.16893052418291576334291.1689305242.1692440521.1692498368.14; token=a4d78cd04f402b3f7ad6a29e8af8aa6f,2,940277; __tk=krazkYhsAcgzjrhtAuewjueDjufpArg5BVoz4zttAzG,2,940277; 3AB9D23F7A4B3CSS=jdd03GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNAAAAAMKCDJFEVIAAAAAC5FNEJMJ5UGYTMX; _gia_d=1; __jdb=122270672.2.16893052418291576334291|14.1692498368; shshshfpb=xbVnfPmoZnca-0u5O8YJzHQ; areaId=25; ipLoc-djd=25-2258-0-0
Origin:
https://item.jd.com
Referer:
https://item.jd.com/
Sec-Ch-Ua:
"Not/A)Brand";v="99", "Google Chrome";v="115", "Chromium";v="115"
Sec-Ch-Ua-Mobile:
?0
Sec-Ch-Ua-Platform:
"Windows"
Sec-Fetch-Dest:
empty
Sec-Fetch-Mode:
cors
Sec-Fetch-Site:
same-site
User-Agent:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36
X-Referer-Page:
https://item.jd.com/6515029.html
X-Rp-Client:
h5_1.0.0
4、接口返回数据:
其中包括:商品图片地址,商品价格,标题,等信息,正是我们所需要的。
(数据量太大,截了一小部分)
{
"extendWarrantyInfo": {
"descUrl": "https://baozhang.jd.com/static/serviceDesc",
"detailUrl": "https://b.jr.jd.com/service/serveIntroduce/#/introduce3?mainSkuId={mainSkuId}&brandId={brandId}&thirdCategoryId={cid3}&bindSkuId={bindSku}",
"serviceItems": [
{
(商品详情数据)
5、数据分析
(1)body参数
经过分析发现,URL里body包含请求参数详情,body经过了url编码,解码后如下:
{"skuId":6515029,"cat":"1316,1381,1391","area":"25_2258_0_0","shopId":"1000099941","venderId":1000099941,"paramJson":"{\"platform2\":\"1\",\"specialAttrStr\":\"p0ppppppppppp1pppppppppppp\",\"skuMarkStr\":\"00\"}","num":1,"bbTraffic":""}
"skuId":6515029为商品编号;"shopId":"1000099941"为店铺编号;其它参数跟浏览器等硬件环境有关,可固定不变。
(2)appid参数
指示接口类别,数据值如下:
appid=pc-item-soa pc端数据详情;
appid=item-v3 数据版本v3;
(3)functionId参数
指示该接口的功能:
functionId=pc_detailpage_wareBusiness pc端商品页面详情
functionId=pc_club_productCommentSummaries pc端评论接口数据
functionId=recDivinerApi 商品页有关数据
functionId=pctradesoa_getprice 返回价格信息
functionId参数不同,body里面的具体参数也不一样。
(4)x-api-eid-token参数
x-api-eid-token=jdd03GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNAAAAAMKCDJFEVIAAAAAC5FNEJMJ5UGYTMX
经测试,该 参数并不会被服务器校验,因此可忽略,不影响数据采集。
(5)h5st参数(数据签名)
h5st=20230820104308635%3B9m99mz6itng955u3%3Bfb5df%3Btk02w99fb1bc541lMisxd2I5N0tm7s66XeObtysPWoIPlRdJ92-R1cXDBQzPnH5QrNdDMfm18N7zHpJuWML8dwJhOORi%3Bed6048632bdcf647c9a4db5b69b49569%3B4.1%3B1692499388635%3Bee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980acde1a10effcc27fd84ad69b6a255fa2bacbfc5a0cc8222e4ac53b669906820b1461c75971601a3f031b5c1f40b721502f3b79e32d29b726ebec75a213493a818f67211b187fcf51e032e0b772bee8c70e4a1d7502aa775b148a504a31d6272cc6f198b41da73fbe26adfe0d7e3723450ed4c906efbd52e0671d7ab8bd9af7bfc208a38071126c8c70d775962c87b10b611b4f8489070e9d264c47c25dbd35aabe0addff39a3c732105c114056f93a71acfb90156d61b39e11217d5bf21c2e
h5st是京东数据签名参数,每个接口都需要。只有签名正确,服务器才会返回数据。不然就会出现多次请求偶尔返回一次数据的情况。
所以,要想采集到数据,必须得到h5st正确的签名。下面具体分析h5st的签名过程:
二、h5st签名分析
1、查找h5st签名算法的位置
全局搜索:getDataColor,为什么要搜索getDataColor,因为h5st算法就在这个函数的附近。
设下断点,刷新页面,截图如下:
(图2)h5st位置
可以直观的看到具体签名过程如下:
try {
var d = JSON.parse(JSON.stringify(r));
d.body = SHA256(s).toString(),
window.PSign.sign(d).then(function(e) {
r.h5st = encodeURI(e.h5st);
//......................
}
签名语句:window.PSign.sign(d);
然后返回: r.h5st = encodeURI(e.h5st);
是一个异步过程。
2、下面具体分析各个签名参数:
(1)body参数
{"skuId":6515029,"cat":"1316,1381,1391","area":"25_2258_2261_6568","shopId":"1000099941","venderId":1000099941,"paramJson":"{\"platform2\":\"1\",\"specialAttrStr\":\"p0ppppppppppp1pppppppppppp\",\"skuMarkStr\":\"00\"}","num":1,"bbTraffic":""}
(2)d参数:
{
"appid": "pc-item-soa",
"functionId": "pc_detailpage_wareBusiness",
"client": "pc",
"clientVersion": "1.0.0",
"t": 1692498783586,
"body": "dddd48059b91f87eb42b080167bd70b5303b3df8c4b71a3967372fcda60cd496"
}
d.body = SHA256(s).toString()
按f11单步跟进,发现SHA256的位置。抠下来:
(SHA256算法)
(3)t参数
t:a
a = (new Date).getTime()
t参数是一个时间戳。
签名参数分析完了,下面寻找h5st签名算法。
3、h5st签名算法
在window.PSign.sign(d)处下断点,按f11键单步进入:
(图3)h5st签名算法
进入h5st签名的js文件后,把该签名文件整个保存下来。该js文件名为:js_security_v3_0.1.4.js
(js_security_v3_0.1.4.js内容)
4、h5st签名返回字符串:
{
"appid": "pc-item-soa",
"functionId": "pc_detailpage_wareBusiness",
"client": "pc",
"clientVersion": "1.0.0",
"t": 1692498783586,
"body": "dddd48059b91f87eb42b080167bd70b5303b3df8c4b71a3967372fcda60cd496",
"_stk": "appid,body,client,clientVersion,functionId,t",
"_ste": 1,
"h5st": "20230820131419818;9m99mz6itng955u3;fb5df;tk03w9d441cbf18nk990HQLMH0ehQyR5j8EBXtSrYlGtY8KzYUkKCoUctg6u1pqtBeAqYw-t1yFcromGuN17RlgILtyk;65001318ffed0d17ee21652afb01a996;4.1;1692508459818;ee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980acde1a10effcc27fd84ad69b6a255fa2bacbfc5a0cc8222e4ac53b669906820b1461c75971601a3f031b5c1f40b721502f3b79e32d29b726ebec75a213493a818f67211b187fcf51e032e0b772bee8c70e4a1d7502aa775b148a504a31d627d6db4fde5974622b566cdace3d88a8999574369ad4a27c752e256a8a6d92a5fdfa8633dae1aa5d17f9ea6a859ed6b22c920d7881227b2f7f61f3bbf82c17afd340c42be154e8e3ad1d39c2d8ba94acb84c25299080b5545acc894168647303ed"
}
其中的h5st字段是我们所需要的。
三、在python等其它语言中调用签名接口
js_security_v3_0.1.4.js是具体的签名文件,但还不能在python中直接调用,会报缺少window的错误,因此需要补环境。
技术支持:复制:byc6352
下面的python代码是调用签名及请求接口(环境已补):
# -*- coding: UTF-8 -*-
import requests,json
import pkgutil
import time
from urllib.parse import urlparse, parse_qs, urlunparse
import hashlib
import execjs
from urllib.parse import quote
import io
import sys
def savetofile(text,filename):
file = open(filename, "w",encoding='utf-8' )
file.write(text)
file.close()
def print_hi(name):
# Use a breakpoint in the code line below to debug your script.
print(f'Hi, {name}') # Press Ctrl+F8 to toggle the breakpoint.
def jd(skuid):
appid='item-v3'
functionId='recDivinerApi'
body={"lid":27,"lim":15,"ec":"utf-8","uuid":"16900368971511636315768","pin":"","p":902029,"sku":skuid,"ck":"pin,ipLocation,atw,aview","c1":1316,"c2":1387,"c3":11932,"securityToken":"iJJJBrR7BAxWWavOluQxmMQ","clientChannel":"3","clientPageId":"item.jd.com"}
js_file = open("h5st.js", "r", encoding='utf-8')
js=js_file.read()
exc = execjs.compile(js)
url= exc.call("sign", appid,functionId,body)
print('url='+url)
headers={
"Authority": "api.m.jd.com",
"Accept": "application / json, text / javascript, * / *; q = 0.01",
"Accept - Encoding": "gzip, deflate, br",
"Accept - Language": "zh - CN, zh;q = 0.9",
"Cookie": "shshshfpb=i0ZU6VlHi9tt1RukWDDyR0w; 3AB9D23F7A4B3C9B=GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNA; shshshfpa=cb3af5e3-c2cf-dae5-48e3-c2331a38092a-1653253655; shshshfpx=cb3af5e3-c2cf-dae5-48e3-c2331a38092a-1653253655; __jdc=122270672; __jdv=122270672|direct|-|none|-|1689305241830; __jdu=16893052418291576334291; areaId=25; ipLoc-djd=25-2258-2261-6568; token=7a3a5010c8ea7250057d9168270daacd,2,939221; __tk=be32047e11adf495830ad564f7c34cd6,2,939221; 3AB9D23F7A4B3CSS=jdd03GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTRiDY9CRQSU93J9SUTiPmFy3PTP7N8itsNd7DLuiPzfoEjAAACXCBKUWUQMP7FMX; _gia_d=1; jsavif=1; __jda=122270672.16893052418291576334291.1689305242.1690550636.1690599310.7; __jdb=122270672.1.16893052418291576334291|7.1690599310",
"Origin": "https://item.jd.com",
"Referer": "https://item.jd.com/",
"Sec-Ch-Ua": "\"Not.A/Brand\";v=\"8\", \"Chromium\";v=\"114\", \"Google Chrome\";v=\"114\"",
"Sec-Ch-Ua-Mobile":"?0",
"Sec-Ch-Ua-Platform":"\"Windows\"",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-site",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36",
"X-Referer-Page": f"https://item.jd.com/{skuid}.html",
"X-Rp-Client": "h5_1.0.0",
}
res=requests.get(url=url, headers=headers)
print(res)
text=res.text
savetofile(text,"sku.txt")
print(text)
return text
# Press the green button in the gutter to run the script.
if __name__ == '__main__':
print_hi('最新4.1版本h5st签名返回商品详情。技术支持:byc6352')
jd(100019322424)
四、在python中成功返回商品详情信息
大功造成!