Python爬虫入门-爬取新浪新闻

运行环境：Python3.6.0

所需的包：

from bs4 import BeautifulSoup
import requests

response = requests.get("http://news.sina.com.cn/china/")
response.encoding = "utf-8"
soup = BeautifulSoup(response.text, "lxml")
headers = soup.select("div.news-item > h2")
links = soup.select("div.news-item > h2 > a")
times = soup.select("div.time")

for header, link, time in zip(headers, links, times):
    with open("sina_news.txt", "a") as f:
        f.write(header.get_text() + "\n" +
                time.get_text() + "\n" +
                link.get("href") +
                "\n---------------------\n")

爬取结果：

Python爬虫入门-爬取新浪新闻.JPG

最后编辑于：2017.12.10 12:44:09

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成，浏览时请结合常识与多方信息审慎甄别。
平台声明：文章内容（如有图片或视频亦包括在内）由作者上传并发布，文章内容仅代表作者本人观点，简书系信息发布平台，仅提供信息存储服务。

友情链接更多精彩内容

3赞4赞

赞赏

手机看全文

Python爬虫入门-爬取新浪新闻

相关阅读更多精彩内容

友情链接更多精彩内容