1.找到百度图片网站并输入搜索词
打开https://image.baidu.com/search/index?tn=baiduimage&ipn=r&ct=201326592&cl=2&lm=-1&st=-1&fm=index&fr=&hs=0&xthttps=111110&sf=1&fmq=&pv=&ic=0&nc=1&z=&se=1&showtab=0&fb=0&width=&height=&face=0&istype=2&ie=utf-8&word=%E7%BE%8E%E5%A5%B3&oq=%E7%BE%8E%E5%A5%B3&rsp=-1
2.分析网页
按F12打开开发者模式。
3.使用python模拟浏览器向浏览器发送请求并获取响应
用到的模块:requests。模块使用之前要先导入(import requests)。
模块安装见https://www.jianshu.com/p/d4262c8d8af8
进入url,按F12进入开发者模式,network--all--top250?start=0&filter=--headers,获取url,和requests方法。
响应回来的数据:HTML+CSS+JS+data,由浏览器进行解析执行
代码:
# 导入模块
import requests
# 发送请求
resp = requests.get(url, headers=headers)
将正确的url打开,Ctrl+A全选,Ctrl+C复制,粘贴到json在线解析器中,每一个object是一个图片
url = 'https://image.baidu.com/search/acjson?tn=resultjson_com&logid=12117865351080430388&ipn=rj&ct=201326592&is=&fp=result&queryWord=%E7%BE%8E%E5%A5%B3&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=-1&z=&ic=0&hd=&latest=©right=&word=%E7%BE%8E%E5%A5%B3&s=&se=&tab=&width=&height=&face=0&istype=2&qc=&nc=1&fr=&expermode=&force=&cg=girl&pn=30&rn=30&gsm=1e&1612964334559='
resp = requests.get(url, headers=headers)
从解析结果看到我们要找的data是在一个字典里,object是在一个列表里,列表中有N多个字典,图片链接就在字典中。
url = 'https://image.baidu.com/search/acjson?tn=resultjson_com&logid=12117865351080430388&ipn=rj&ct=201326592&is=&fp=result&queryWord=%E7%BE%8E%E5%A5%B3&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=-1&z=&ic=0&hd=&latest=©right=&word=%E7%BE%8E%E5%A5%B3&s=&se=&tab=&width=&height=&face=0&istype=2&qc=&nc=1&fr=&expermode=&force=&cg=girl&pn=30&rn=30&gsm=1e&1612964334559='
resp = requests.get(url, headers=headers)
# 继续提取数据,遍历列表中的数据,根据键获取thumbURL的值
url = 'https://image.baidu.com/search/acjson?tn=resultjson_com&logid=12117865351080430388&ipn=rj&ct=201326592&is=&fp=result&queryWord=%E7%BE%8E%E5%A5%B3&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=-1&z=&ic=0&hd=&latest=©right=&word=%E7%BE%8E%E5%A5%B3&s=&se=&tab=&width=&height=&face=0&istype=2&qc=&nc=1&fr=&expermode=&force=&cg=girl&pn=30&rn=30&gsm=1e&1612964334559='
resp = requests.get(url, headers=headers)
resp = requests.get(item, headers=headers)
with open('img/'+str(count)+'.jpg', 'wb') as file:
response.content,把response对象转换成二进制数据
response.text,把response对象转换成字符串数据