Python使用xpath解析带命名空间的XML

xpath解析XML简单明了，但是XML有命名空间的话就会出错了。解决方法是节点前加命名空间的前缀，下例中x、y是变量可以任意定义。

from typing import Text
from lxml import etree
def with_xpath(xml_data:Text,xpath_expr:str,namespaces:dict=None)->list:
    '''通过xpath提起XML格式的数据'''
    if isinstance(xml_data, str):  # 将字符串转成字节码
        xml_data = xml_data.encode("utf-8")
    try:
        xml = etree.XML(xml_data)
        if namespaces and isinstance(namespaces,dict):
            list_result = xml.xpath(xpath_expr,namespaces=namespaces)
        else:
            list_result = xml.xpath(xpath_expr)
        return list_result
    except Exception as e:
        raise Exception(f"XML-通过xpath获取指定节点信息错误，错误原因：{e}")

1. 例如xml文档如下：

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <soap:Body>待提取的文本</soap:Body>
</soap:Envelope>

解析代码片段及运行结果：

>>> with_xpath(v,'//x:Body/text()',{'x': 'http://schemas.xmlsoap.org/soap/envelope/'})
['待提取的文本']

2. 如果XML文档如下：

<getRegionCountryResponse xmlns="http://WebXml.com.cn/">
    <getRegionCountryResult>
        <string>阿尔及利亚,3320</string>
        <string>阿根廷,3522</string>
        <string>阿曼,3170</string>
        <string>阿塞拜疆,3176</string>
    </getRegionCountryResult>
</getRegionCountryResponse>

解析代码片段及运行结果：

>>> with_xpath(v,'//x:getRegionCountryResponse//x:string/text()',{'x': 'http://WebXml.com.cn/'})
['阿尔及利亚,3320', '阿根廷,3522', '阿曼,3170', '阿塞拜疆,3176']

最后编辑于：2021.08.24 11:24:16

Python使用xpath解析带命名空间的XML

1. 例如xml文档如下：

2. 如果XML文档如下：

推荐阅读更多精彩内容