2018-06-12
实现TransE算法时,在数据准备阶段,需要读取entity2id.txt和relation2id.txt文件,形成字典格式。
遇到两种读取方法:
- 利用open打开文件,按行读取并生成字典
sp = '\t'
file_path = os.path.jon(data_dir,"entity2id.txt")
with open(file_path,"r") as f:
for line in f1:
entityAndId = line.strip().split(sp)
entity2id[entityAndId[0]] = entityAndId[1]
entity_num +=1
2.利用panda一次性读取,并用借助zip函数转换为字典
file_path = os.path.join(data_dir,"entity2id.txt")
entity_df = pd.read_table(file_path,header=None)
entity2id = dict(zip(entity_df[0],entity_df[1]))
entity_num = len(entity2id)
那么,问题来了- 孰优孰劣???