新手学习python的的二天,爬取商品信息
最终成果:
代码:
from bs4 import BeautifulSoup
with open('/home/ayongga/PycharmProjects/1_2/1_2answer_of_homework/index.html', 'r')as html_data:
soup = BeautifulSoup(html_data, 'lxml')
images = soup.select('body > div > div > div.col-md-9 > div > div > div > img')
titles = soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4 > a')
prices = soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4.pull-right')
comments = soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p.pull-right')
stars = soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings')
for image,title,price,comment,star in zip(images,titles,prices,comments,stars):
data = {
'image':image.get('src'),
'title':title.get_text(),
'price':price.get_text(),
'comment':comment.get_text(),
'star': len(star.find_all('span',class_='glyphicon glyphicon-star'))
}
print(data)
总结:
- 一对多→找父级
- find_all得到一个列表,len统计列表元素的个数
- 实战对我提升很大