在实际工作中遇到这样一个问题,在导文件的时候,一个csv文件的header是由另一个xml文件决定的。导入时候如果二者有不同,只会提示跟xml定义的不一样,但是不提示具体是哪里不一样。如果字段很多,一个一个去比较,非常麻烦,于是写了这个小脚本解决这个问题。
原理:
- 取出CSV文件中的header
- 取出XML中定义的header
- 比较二者,不同的打印出来
代码如下:
#! /usr/bin/env python
#coding=utf-8
import csv
import easygui
import xml.etree.ElementTree as ET
import os
### read header from CSV file
print "Please select your CSV file then select Data Model"
filepath1 = easygui.fileopenbox()
with open(filepath1) as f:
f_csv = csv.reader(f)
headers_in_csv = next(f_csv)
del headers_in_csv[0]
background_element_id_in_csv = headers_in_csv[0]
del headers_in_csv[0]
print headers_in_csv
#print background_element_id_in_csv
### read header configuration from data model
filepath = easygui.fileopenbox()
root = ET.parse(filepath).getroot()
headers_in_datamodel = []
for background_element in root.findall("background-element[@type-id='24']"):
if background_element.attrib["id"] == background_element_id_in_csv:
for child in background_element:
data_field_dic = child.attrib
if "id" in data_field_dic.keys():
headers_in_datamodel.append(data_field_dic["id"])
print headers_in_datamodel
a = len(headers_in_csv)
b = len(headers_in_datamodel)
print a, b
if a == b:
if headers_in_csv==headers_in_datamodel:
print "same header"
else:
for i in range(a):
if headers_in_csv[i] not in headers_in_datamodel:
print headers_in_csv[i]
print "Header not match, please check your csv file and data model!"
else:
print "Header not match, please check your csv file and data model!"