写在前面

环境：pycharm
用到的库：re、requests

过程

找到URL

在搜索框里输入关键字，可以发现url发生了变化，我们把一些不需要的参数给去掉，试试网页还能不能正常返回（别问我怎么知道哪些需要哪些不需要）

然后整理得到最终的URL是这个样子的

分析网页源代码

这里我们查看网页的源代码，随便搜索一个物品的名称，发现是在raw_title这里面

同理，我们可以找到价格的位置存放在raw_price里，这样我们就可以获取到名称与价格了

实现过程

首先导入我们所需要的库

import re
import requests

接下来是获取网页的源代码

def getHTMLText(url):
    try:
        response = requests.get(url, timeout = 30)
        response.raise_for_status()
        response.encoding = 'utf-8'
        return response.text
    except:
        return ''

然后就是解析网页，并获取宝贝的价格和标题了

def parseHtml(html):
    try:
        re_title = re.compile(r'"raw_title":"(.*?)"', re.S)
        re_price = re.compile(r'"view_price":"(.*?)"', re.S)
        raw_title = re.findall(re_title, html)
        view_price = re.findall(re_price, html)
        for title, price in zip(raw_title, view_price):
            print(title, price)
    except:
        return ''

基本大功告成了，再稍稍的添加一下功能，比如分页效果之类的(写在main函数里了)

  def main():
    url = 'https://s.taobao.com/search?q='
    goods = input('查询物品名称：')
    deeps = int(input('查询页数：'))
    print('-' * 30)
    for i in range(deeps):
        html = getHTMLText(url + goods + "&s=" + str(44 *i))
        parseHtml(html)

nice，完整代码贴上

import requests
import re

#获取网页源代码
def getHTMLText(url):
    try:
        response = requests.get(url, timeout = 30)
        response.raise_for_status()
        response.encoding = 'utf-8'
        return response.text
    except:
        return ''

#解析网页，并获取宝贝的价格和标题
def parseHtml(html):
    try:
        re_title = re.compile(r'"raw_title":"(.*?)"', re.S)
        re_price = re.compile(r'"view_price":"(.*?)"', re.S)
        raw_title = re.findall(re_title, html)
        view_price = re.findall(re_price, html)
        for title, price in zip(raw_title, view_price):
            print(title, price)
    except:
        return ''

def main():
    url = 'https://s.taobao.com/search?q='
    goods = input('查询物品名称：')
    deeps = int(input('查询页数：'))
    print('-' * 30)
    for i in range(deeps):
        html = getHTMLText(url + goods + "&s=" + str(44 *i))
        parseHtml(html)

if __name__ == '__main__':
    main()

完成

附一张效果图

Python爬虫之模拟淘宝搜索物品信息

Python爬虫之模拟淘宝搜索物品信息

写在前面

过程

找到URL

分析网页源代码

实现过程

完成

推荐阅读更多精彩内容