1. BeatifulSoup简介

1.1 运行BeautifulSoup

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen('https://www.pythonscraping.com/pages/page1.html')  
bs = BeautifulSoup(str(html.read(), encoding='utf-8'), 'lxml')  
print(bs.h1)
# 输出结果
<h1>An Interesting Title</h1>

bs = BeautifulSoup(str(html.read(), encoding='utf-8'), 'lxml')  
bs = BeautifulSoup(str(html.read(), encoding='utf-8'), 'html5lib')

第一个参数为html信息，第二个参数为解析器参数，可供选择的解析器有（html.parser, lxml, html5lib）。各有优劣。

1.2 可靠的网络连接和异常处理

from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup

def getTitle(url):
  try:
    html = urlopen(url)  
  except HTTPError as e:
    return None
  try: 
    bs = BeautifulSoup(str(html.read(), encoding='utf-8'), 'lxml')  
    title = bs.body.h1
  except AtterbuteEroor as e:
    return None
  return title
title = gettITLE('https://www.pythonscraping.com/pages/page1.html')
if title == None:
  print('title ccould not be found')
print(bs.h1)
# 输出结果
<h1>An Interesting Title</h1>

在写代码时，思考代码的总体布局，让代码既可以捕捉异常又容易阅读。

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成，浏览时请结合常识与多方信息审慎甄别。
平台声明：文章内容（如有图片或视频亦包括在内）由作者上传并发布，文章内容仅代表作者本人观点，简书系信息发布平台，仅提供信息存储服务。

禁止转载，如需转载请通过简信或评论联系作者。

1. BeatifulSoup简介

1. BeatifulSoup简介

1.1 运行BeautifulSoup

1.2 可靠的网络连接和异常处理

相关阅读更多精彩内容

友情链接更多精彩内容