python:计算[换行符分隔]fasta文件每条序列的GC含量

一、输入文件

cat test.txt

>sequence_1
AAAA
GGGG
CCCC
TTTT
>sequence_2
aaaa
gggg
cccc
tttt

二、python3代码

#!/usr/bin/env python3
import os, sys, re
ms, infile, outfile = sys.argv
with open(infile) as f:
    Dict = {}
    for line in f:
        if line[0] == ">":
            key = re.sub('>', '', line.strip())
            Dict[key] = []
        else:
            Dict[key].append(line.strip())

with open(outfile, 'w') as o:
    o.write("id\tGC_percent\n")
    for key, value in Dict.items():
        seq = ''.join(value)
        nG = seq.count("G") + seq.count("g")
        nC = seq.count("C") + seq.count("c")
        gc_percent = (nG + nC)/len(seq)
        o.write("{}\t{}\n".format(key, gc_percent))

三、运行结果

python3 ../fasta_gc_percent.py test.txt out.txt
cat out.txt
id      GC_percent
sequence_1      0.5
sequence_2      0.5

\color{green}{😀😀😀😀}

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。