该程序是获得unique比对序列长度分布信息。
from collections import Counter
Fileout = open('unique.sam','r')
Filein = open('unique_static.txt','w')
dic, arr = {}, []
for line in Fileout.readlines():
if len(line.split()) >3 :
arr.append(len(str(line.split('\t')[9])))
if not line:
break
dic = Counter(arr)for k,v in dic.iteritems():
print>>Filein, k, ',', v
上面的程序无法流程化处理,可以调用python的sys来实现流程化的处理。
samtools view *.sam | grep "\bNH:i:1\b" | python pyproj.py
#grep截取unique比对的序列,标准输出再定向给python程序
import sys
from collections import Counter
lines = sys.stdin.readlines()
dic, arr = {}, []
for line in lines:
if len(line.split()) >9 :#有时候>12会更准确
arr.append(len(str(line.split('\t')[9])))
dic = Counter(arr)
for k,v in dic.iteritems():
print k, ',', v