一、目的
- 提取taxonomy里的family信息
cat test.txt
k__Bacteria;p__Verrucomicrobia;c__Verrucomicrobiae;o__Verrucomicrobiales;f__Akkermansiaceae;g__Akkermansia;
二、python3代码
vi get_family.py
#!/usr/bin/env python3
import os, sys, re
ms, infile = sys.argv
with open(infile) as f:
f = f.read()
species = re.findall(r'f__(.*)\.g__', f)
print(''.join(species))
- read(): 读取整个文件,使用与小文件,大文件推荐for逐行读取;
- re.findall(r'', file): 从整个文件中搜索符合正则的字符串,返回列表
- ‘’。join(): 将列表拼成字符串输出
三、测试
python3 get_family.py test.txt
Akkermansiaceae
参考:
https://blog.csdn.net/Fairy_teacher/article/details/106916453