登录注册写文章

爬虫基础

爬虫基础

http1.1 http://files.blogjava.net/sunchaojin/http1.3.pdf

1.查看网页源码 Chrome页面Ctrl + U、F12

2.使用pycharm创建网页文件源码包括：images文件夹、css文件、html文件

3.安装库lxml、BeautifulSoup4、Requests

https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/

http://beautifulsoup.readthedocs.io/zh_CN/latest/

http://docs.python-requests.org/zh_CN/latest/user/quickstart.html

response 成功 status_code:200

from bs4 import BeautifulSoup

import requests

import time

urls = ['http://www.duzhe.com/index.php?v=listing&cid=38&page={}'.format(str(i))for i in range(1,9)]

def get_list(url,data=None):

wb_data = requests.get(url)

time.sleep(1)

soup = BeautifulSoup(wb_data.text,'lxml')

titles = soup.select('#con_warp > div > div > div.left_p > ul > li > div.con_top > h3 > a')

likes = soup.select('#con_warp > div > div > div.left_p > ul > li > div.icons_warp > a:nth-of-type(1)')

for title,like in zip(titles,likes):

data = {

'title' :title.get_text(),

'like' :like.get_text()

}

print(data)

for single_url in urls:

get_list(single_url)

最后编辑于：2018.07.11 16:01:21

©著作权归作者所有,转载或内容合作请联系作者
平台声明：文章内容（如有图片或视频亦包括在内）由作者上传并发布，文章内容仅代表作者本人观点，简书系信息发布平台，仅提供信息存储服务。

推荐阅读更多精彩内容

一、python爬虫基础与html文档解析
爬虫是什么爬虫是一段用来抓取互联网数据的一段程序，给定一个位置（url）为起点，爬虫从这个url开始，爬去互联网上...
zhile_doing阅读 3,923评论 0赞 0
爬虫基础库
概要介绍下requests和BeautifulSoup两个库的基本使用具体内容 requestsrequest...
蜡笔小姜和畅畅阅读 2,669评论 0赞 0
Python 爬虫入门课作业3－爬虫基础
课程作业选择第二次课程作业中选中的网址爬取该页面中的所有可以爬取的元素，至少要求爬取文章主体内容可以尝试用l...
不忘初心2017阅读 4,249评论 6赞 4
python实战计划：爬取租房信息
Date:2016-9-21update:2016-9-30By:Black Crow 前言：终于进入到网络页面...
black_crow阅读 4,429评论 0赞 2
Python爬虫基础学习，从一个小案例来学习xpath匹配方法
学习目的是为了实践，而实践又可以加深我们的学习效率，今天给大家带来了lxml库的xpath匹配方法的实例！教程大家...
云飞学编程阅读 3,520评论 3赞 5

赞1赞

赞赏

手机看全文