XML转CSV的一些想法

今天拿到一个xml的文件，没有Nesting，希望转成CSV的格式。以下是两个思路。

使用lxml和xpath，把xml文件按爬虫的方式把各个值抠出来。

from lxml import etree
#文件里有特殊字符SOH，使用recover比较好读入
parser = etree.XMLParser(recover=True)
tree = etree.parse("car.xml",parser)
#用utf-8来写入中文字符
import codecs

i=0

def find_str(column,i):
    try:
        a = tree.xpath("//"+column)[i].xpath("text()")[0]
    except:
        return "None"
    return a
    
while True:  
    with codecs.open("car2.csv","a",encoding='utf-8') as f:
        try:
            f.write(find_str("sale_id",i))
            f.write("|")
            :   
            :
            :
            f.write(find_str("rank",i))
            f.write("|")

            f.write(find_str("update_time",i))
            f.write("|")

            f.write('\n')
            f.flush()
    
        except:
            break
            
    i=i+1

用正则表达式替换

with open("car.xml","r") as f:
    fin=f.read()

import re
#先去除所有空白符
#然后每个update_time后面新起一行，就是用update_time作为行分割
line_sep = re.sub('\s+','',fin).replace('</update_time>','\n')
#以所有</>作为字段分割，替换成pipeline
add_pipeline = re.sub('</\w+>','|',line_sep)
#把所有<>都删掉
remove_tag_start = re.sub('<\w+>','',add_pipeline)
import codecs
with codecs.open("car3.csv",'w') as fout:
    fout.write(remove_tag_start)

最后编辑于：2017.11.27 03:52:12

XML转CSV的一些想法

推荐阅读更多精彩内容