xpath解析XML简单明了,但是XML有命名空间的话就会出错了。解决方法是节点前加命名空间的前缀,下例中x、y是变量可以任意定义。
from typing import Text
from lxml import etree
def with_xpath(xml_data:Text,xpath_expr:str,namespaces:dict=None)->list:
'''通过xpath提起XML格式的数据'''
if isinstance(xml_data, str): # 将字符串转成字节码
xml_data = xml_data.encode("utf-8")
try:
xml = etree.XML(xml_data)
if namespaces and isinstance(namespaces,dict):
list_result = xml.xpath(xpath_expr,namespaces=namespaces)
else:
list_result = xml.xpath(xpath_expr)
return list_result
except Exception as e:
raise Exception(f"XML-通过xpath获取指定节点信息错误,错误原因:{e}")
1. 例如xml文档如下:
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soap:Body>待提取的文本</soap:Body>
</soap:Envelope>
解析代码片段及运行结果:
>>> with_xpath(v,'//x:Body/text()',{'x': 'http://schemas.xmlsoap.org/soap/envelope/'})
['待提取的文本']
2. 如果XML文档如下:
<getRegionCountryResponse xmlns="http://WebXml.com.cn/">
<getRegionCountryResult>
<string>阿尔及利亚,3320</string>
<string>阿根廷,3522</string>
<string>阿曼,3170</string>
<string>阿塞拜疆,3176</string>
</getRegionCountryResult>
</getRegionCountryResponse>
解析代码片段及运行结果:
>>> with_xpath(v,'//x:getRegionCountryResponse//x:string/text()',{'x': 'http://WebXml.com.cn/'})
['阿尔及利亚,3320', '阿根廷,3522', '阿曼,3170', '阿塞拜疆,3176']