在win10下,使用pip安装
先安装beautifulsoup4,然后还需要安装lxml(html解析器)
pip install beautifulsoup4
pip install lxml
使用beautifulsoup,可以是个文件,也可以是段代码
from bs4 import BeautifulSoup
soup = BeautifulSoup(open("index.html"),"lxml")
soup2 = BeautifulSoup("<html>data</html>","lxml")
Beautifu Soup将HTML文档转换成树形结构,每个节点都是Python对象,可以归纳为:Tag,NavigableString,BeautifulSoup,Comment