从质控log文件中提取碱基过滤信息(python实现)

先准备文件路径

head -n 2 pwd.list
/public1/home/scb6498/02_clean_data/final_300/disk1/wait/GM002-F300_disk1-1_fastp.txt
/public1/home/scb6498/02_clean_data/final_300/disk1/wait/GM003-F300_disk1-1_fastp.txt

有些代码是可有可无的,忘了删了

#!/bin/python
import re

fi = open("pwd.list","r")
fi.seek(0)
total_txt = fi.readlines()
num = len(total_txt)
with open("samstat.txt","w") as fo:
  #header
  fo.write("name\tpwd\tbeforereads\tafterreads\tbeforebases\tafterbases\n")
  for i in range(0,num):
    pwd = total_txt[i]
    pwd = pwd.strip()
    readfile = open(pwd,"r")
    fline = readfile.readlines()
    readslist = []
    baseslist = []
    for j in fline:
      if "total reads:" in j:
        j = j.strip()
        readslist.append(int(j.split(":")[-1]))
      if "total bases:" in j:
        baseslist.append(int(j.split(":")[-1]))
    before_reads = str(readslist[0]+readslist[1])
    after_reads = str(readslist[2]+readslist[3])
    before_bases = str(baseslist[0]+baseslist[1])
    after_bases = str(baseslist[2]+baseslist[3])
    fo.write(re.split('/|_disk|\-disk',pwd)[-2]+"\t"+re.split('/|_fastp.',pwd)[-2]+"\t"+before_reads+"\t"+after_reads+"\t"+before_bases+"\t"+after_bases+"\n")
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容