Python实战计划week1_3项目

python实战计划的第三个项目：爬取租房信息。

最终结果如下：

one_three.png

其中包括9张页面，每张页面包含24间房，共计216间房间，即216条数据。
每条数据包含7项信息，分别是：标题、地址、日租金、第一张房间图片链接、房东图片链接、房东性别和房东名称。

代码如下：

import requests
from bs4 import BeautifulSoup
import time


def get_links(url):
    wb_data = requests.get(url)
    soup = BeautifulSoup(wb_data.text, 'lxml')
    links = soup.select('#page_list > ul > li > a')
    for link in links:
        href = link.get('href')
        one(href)


def if_sex(sexname):
    if sexname == ['member_girl_ico']:
        return '女'
    elif sexname == ['member_boy_ico']:
        return '男'
    else:
        return '没填写'


def one(url, data=None):
    wb_data = requests.get(url)
    soup = BeautifulSoup(wb_data.text, 'lxml')
    titles = soup.select('div.pho_info > h4 > em')
    addres = soup.select('div.pho_info > p > span.pr5')
    prices = soup.select('#pricePart > div.day_l > span')
    images = soup.select('#curBigImage')
    pictures = soup.select('#floatRightBox > div.js_box.clearfix > div.member_pic > a > img')
    sexes = soup.select('#floatRightBox > div.js_box.clearfix > div.w_240 > h6 > span')
    names = soup.select('#floatRightBox > div.js_box.clearfix > div.w_240 > h6 > a')
    # print(titles,addres,prices,pictures,names)
    if (data == None):
        for title, addre, price, picture, name, sex, image in zip(titles, addres, prices, pictures, names, sexes,
                                                                  images):
            data = {
                'title': title.get_text(),
                'addre': addre.get_text().replace('\n', '').replace(' ', ''),
                'price': price.get_text(),
                'picture': picture.get('src'),
                'name': name.get_text(),
                'sex': if_sex(sex.get('class')),
                'image': image.get('src')
            }
            print(data)


urls = ['http://wh.xiaozhu.com/search-duanzufang-p{}-0/?startDate=2016-07-17&endDate=2016-08-24'.format(i) for i in
        range(1, 10)]

for url in urls:
    get_links(url)
    time.sleep(2)

总结：

1.一个大的任务尽可能的拆分成小的任务，并注意每一块的输入条件与输出信息。
2.replace('a','b'),replace方法，用b替换a。

Python实战计划week1_3项目

最终结果如下：

代码如下：

总结：

推荐阅读更多精彩内容