lxml.objectify

转译自：https://lxml.de/objectify.html
lxml.objectify主要用于处理以数据为中心的文档，可以根据叶子节点所含的内容自动推断数据类型。该模块依然使用lxml.etree的ElementTree，但是节点元素分成两类：结构节点元素(Tree Element)和数据节点元素(Data Element)。

基本

该模块继承了一些etree的API，如：

>>> from lxml import objectify, etree
>>> from io import BytesIO

>>> root = objectify.Element("root")
>>> a = objectify.SubElement(root, "a")

>>> root = objectify.fromstring("<root><a>test</a><b>11</b></root>")

>>> doc = objectify.parse(BytesIO(b"<root><a>test</a><b>11</b></root>"))
>>> root = doc.getroot()

注意，该模块的parse函数依然生成lxml.etree模块的ElementTree，但是节点元素分成两类：结构节点元素，默认类型为objectify.ObjectifiedElement；以及数据节点元素，类型为objectify.IntElement、objectify.StringElement等。

>>> from lxml import objectify
>>> from io import BytesIO
>>> doc = objectify.parse(BytesIO(b"<root><a>test</a><b>5</b></root>"))
>>> type(doc)
<class 'lxml.etree._ElementTree'>
>>> root = doc.getroot()
>>> type(root)
<class 'lxml.objectify.ObjectifiedElement'>
>>> type(root.a), type(root.b)
(<class 'lxml.objectify.StringElement'>, <class 'lxml.objectify.IntElement'>)

子元素可以直接通过'.'语法访问。

>>> root = objectify.Element("root")
>>> b1 = objectify.SubElement(root, "b")
>>> print(root.b.tag)
b

注意，有多个相同tag名的子元素时，'.'语法的返回值也可以看成是具有该tag名的子元素的序列。因为对objectify的元素使用下标[n]运算，会寻找它的兄弟元素，且迭代操作也是对同样tag名的兄弟元素进行迭代：

>>> b2 = objectify.SubElement(root, "b")
>>> root.b[0] is b1, root.b[1] is b2
(True, True)
>>> for b in root.b: print(b.tag)
b
b

特别注意下列用法：

# 迭代自己
>>> for b in b1: print(b.tag)
b
b

可以参考源代码实现(Cython):

cdef class ObjectifiedElement(ElementBase):
    def __iter__(self):
        u"""Iterate over self and all siblings with the same tag.
        """
        parent = self.getparent()
        if parent is None:
            return iter([self])
        return etree.ElementChildIterator(parent, tag=self.tag)

    def __len__(self):
        u"""Count self and siblings with the same tag.
        """
        return _countSiblings(self._c_node)

    def __getattr__(self, tag):
        u"""Return the (first) child with the given tag name.  If no namespace
        is provided, the child will be looked up in the same one as self.
        """
        if is_special_method(tag):
            return object.__getattr__(self, tag)
        return _lookupChildOrRaise(self, tag)

    def __getitem__(self, key):
        u"""Return a sibling, counting from the first child of the parent.  The
        method behaves like both a dict and a sequence.
        * If argument is an integer, returns the sibling at that position.
        * If argument is a string, does the same as getattr().  This can be
        used to provide namespaces for element lookup, or to look up
        children with special names (``text`` etc.).
        * If argument is a slice object, returns the matching slice.
        """
        cdef tree.xmlNode* c_self_node
        cdef tree.xmlNode* c_parent
        cdef tree.xmlNode* c_node
        cdef Py_ssize_t c_index
        if python._isString(key):
            return _lookupChildOrRaise(self, key)
        elif isinstance(key, slice):
            return list(self)[key]
        # normal item access
        c_index = key   # raises TypeError if necessary
        c_self_node = self._c_node
        c_parent = c_self_node.parent
        if c_parent is NULL:
            if c_index == 0:
                return self
            else:
                raise IndexError, unicode(key)
        if c_index < 0:
            c_node = c_parent.last
        else:
            c_node = c_parent.children
        c_node = _findFollowingSibling(
            c_node, tree._getNs(c_self_node), c_self_node.name, c_index)
        if c_node is NULL:
            raise IndexError, unicode(key)
        return elementFactory(self._doc, c_node)

迭代root.X得到的是root的tag名为X的子元素的序列，要访问root的所有子元素可以通过iterchildren()或getchildren()方法。

>>> root = objectify.fromstring("<root><b>10</b><b>11</b><a>test</a><b>12</b></root>")
>>> [el.tag for el in root.b]
['b', 'b', 'b']
>>> [el.tag for el in root.iterchildren()]
['b', 'b', 'a', 'b']
>>> root.index(root.b[0]), root.index(root.b[1]), root.index(root.b[2])
(0, 1, 3)

类似地，len(elt)返回elt的兄弟个数(包括自身)，而elt.countchildren()返回elt的子元素个数。

零散

元素的属性依然是通过get和set方法来操作。

>>> root.set('myattr', 'someval')
>>> root.get('myattr')
'someval'

直接用'.'语法赋值也可以添加子元素，此时子元素(子树)会被自动deep copied且子树的根的tag会被改成指定的名字：

>>> el = objectify.Element('other')
>>> root.c = el
>>> root.c.tag
'c'
>>> el.tag
'other'

也可以用列表赋值：

>>> root.c = [ objectify.Element("c"), objectify.Element("c") ]
>>> [el.tag for el in root.c]
['c', 'c']

注意，如果用数字，字符串等赋值，会生成Data Element并添加到树上：

>>> root.d = 1
>>> root.d
1
>>> type(root.d)
<class 'lxml.objectify.IntElement'>

对于Data Element，访问它的.pyval属性可以得到对应的数据值。另外，.text属性依然是字符串。

>>> root.d.pyval
1
>>> root.d.text
'1'

objectify提供DataElement() 工厂函数来生成数据节点，生成的元素的tag默认是'value'。

>>> el = objectify.DataElement(5, _pytype="int")
>>> el.pyval
5
>>> el.tag
'value'
>>> root.e = objectify.DataElement(5, _pytype="int")

类型标记

某些方法(如Element()工厂函数)生成元素的时候会自动加上namespace和type标记：

>>> a = objectify.Element('a')
>>> etree.tostring(a)
b'<a xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" py:pytype="TREE"/>'

某些方法(如fromstring)则不会自动加上这些标记：

>>> a = objectify.fromstring("<a>test</a>")
>>> etree.tostring(a)
b'<a>test</a>'

但是源数据带type标记时，fromstring函数会利用这些标记来解析数据：

>>> a = objectify.fromstring('<a xmlns:py="http://codespeak.net/lxml/objectify/pytype" py:pytype="str">5</a>')
>>> a
'5'
>>> etree.tostring(a)
b'<a xmlns:py="http://codespeak.net/lxml/objectify/pytype" py:pytype="str">5</a>'

annotate和deannotate方法可以用于添加和移除这些标记：

>>> objectify.deannotate(a, cleanup_namespaces=True)
>>> etree.tostring(a)
b'<a>5</a>'
>>> type(a)
<class 'lxml.objectify.StringElement'>
>>> a.attrib
{}
>>> objectify.annotate(a)
>>> etree.tostring(a)
b'<a xmlns:py="http://codespeak.net/lxml/objectify/pytype" py:pytype="int">5</a>'
>>> type(a)         # a仍然是StringElement类
<class 'lxml.objectify.StringElement'>
>>> a.pyval
'5'
>>> a.attrib        # 但是a的attrib字典变了
{'{http://codespeak.net/lxml/objectify/pytype}pytype': 'int'}

注意deannotate默认并不去除命名空间，因此需要设cleanup_namespaces为True。另外注意这里可能出现了一个bug：annotate添加标记时并不管原来的元素所属的class，因此出现了标记跟元素本身的class不符合的现象。

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 205,132评论 6赞 478
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 87,802评论 2赞 381
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 151,566评论 0赞 338
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,858评论 1赞 277
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,867评论 5赞 368
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,695评论 1赞 282
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 38,064评论 3赞 399
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,705评论 0赞 258
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 42,915评论 1赞 300
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,677评论 2赞 323
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,796评论 1赞 333
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,432评论 4赞 322
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 39,041评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,992评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,223评论 1赞 260
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 45,185评论 2赞 352
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,535评论 2赞 343

lxml.objectify

基本

零散

类型标记

推荐阅读更多精彩内容