文件:icd10cm_tabular_2019.xml
文件大小:8.7M
测试 1
将 XML 全部加在到内存,分别使用 ElementTree.fromstring 和 BeautifulSoup(字符串) 构建 10 次文档树,代码如下:
from bs4 import BeautifulSoup
from xml.etree import ElementTree
import time
from tqdm import trange
xml_path = "./icd10cm_tabular_2019.xml"
xml_content = None
with open(xml_path, "rb") as xml_file:
xml_content = xml_file.read()
start_time = time.time()
for _ in trange(10):
ElementTree.fromstring(xml_content)
print("ElementTree", round(time.time() - start_time, 2))
start_time = time.time()
for _ in trange(10):
BeautifulSoup(xml_content, "lxml-xml")
print("BeautifulSoup lxml", round(time.time() - start_time, 2))
测试结果:
测试次数 | 耗时(秒) | 平均耗时(秒) | |
---|---|---|---|
ElementTree | 10 | 8.53 | 0.8 |
BeautifulSoup lxml-xml | 10 | 149.3 | 15 |
测试 2
不加载 XML 到内存,直接读文件,分别使用 ElementTree.parse 和 BeautifulSoup(文件流) 构建 10 次文档树,代码如下:
from bs4 import BeautifulSoup
from xml.etree import ElementTree
import time
from tqdm import trange
xml_path = "/Users/yangxiao/DATA/2019-ICD-10-CM/icd10cm_tabular_2019.xml"
start_time = time.time()
for _ in trange(10):
with open(xml_path, "rb") as xml_file:
ElementTree.parse(xml_file)
print("ElementTree", round(time.time() - start_time, 2))
start_time = time.time()
for _ in trange(10):
with open(xml_path, "rb") as xml_file:
BeautifulSoup(xml_file, "lxml-xml")
print("BeautifulSoup lxml", round(time.time() - start_time, 2))
测试结果:
测试次数 | 耗时(秒) | 平均耗时(秒) | |
---|---|---|---|
ElementTree | 10 | 6.2 | 0.6 |
BeautifulSoup lxml-xml | 10 | 105.56 | 10 |
结论:
ElementTree 明显快于 BeautifulSoup lxml-xml ,至少快 10 倍以上。
ElementTree.parse 速度快于 ElementTree.fromstring