1.获取给定DNA序列的反向互补序列:
思路:从一个文件中导入序列,然后将互补后的结果输出到另外一个文件中。先定义的一个字典: complement = {'C': 'G', 'G': 'C', 'T': 'A', 'A': 'T'}
然后 for i in list(seq):
rev_dna += complement[i]
rev_dna = rev_dna[::-1]
print (rev_dna)
直接得到互补后的结果
n = 0
complement = {'A':'T','G':'C','C':'G','T':'A'}
rev_seq: str= ''
with open(r'/home/yt/PycharmProjects/test/1minia_result.txt', 'w') as f1:
with open(r'/home/yt/PycharmProjects/test/1minia.contigs.fa', 'r') as f:
for dna_seq in f:
n +=1
if n < 100:
rev_seq = ''
if dna_seq.startswith(">"):
f1.write(dna_seq)
else:
dna_seq = list(dna_seq.strip())
for i in dna_seq:
rev_seq += complement[i]
rev_seq = rev_seq[::-1]
f1.write(rev_seq)
f1.write('\n')
2.把DNA转换成RNA
思路:将DNA序列中的T换成U就可以啦,直接import re,然后用sub就行啦
import re
dnaSeq = ''
with open('/home/yt/PycharmProjects/test/1minia.contigs.fa') as f:
for line in f:
line = line.rstrip()
dnaSeq += line.upper()
rnaSeq1 = re.sub('T', 'U', dnaSeq)
rnaSeq2 = dnaSeq.replace('T', 'U')
print(rnaSeq1)
print(rnaSeq2)
3.提取指定ID的序列
思路:将序列存入字典,打开字典读取序列,利用字典提取指定ID序列
dict={}
fw = open('/home/yt/PycharmProjects/test/1minia.result2.txt', 'w')
fr = open('/home/yt/PycharmProjects/test/1minia.contigs.fa', 'r')
for line in fr:
if line.startswith('>'):
line = line.strip('\n')
line = line.replace(' ', '')
name = line
dict[name] = ''
else:
dict[name] += line.replace('\n','')
fr.close()
for ID in dict.keys():
if ID == '>14__len__4122':
fw.write(ID + "\n" + dict[ID])
fw.close()
4.每行指定长度输出序列
思路:使用循环输出列表,利用 计数器控制输出数量,当输出到第十个,计数器归零,重新开始计数,print输出增加end参数可以控制输出后以什么结尾,这里使用range方法快速生成10-90的数字添加进list列表
results = list(range(10, 90))
n = 10 # 每10个数换一行
for i in range(len(results)):
print(results[i], end=' ')
if (i+1) % 10 == 0:
print("\n")
with open(r'/home/yt/PycharmProjects/test/1minia.result4.txt', 'w') as f1:
with open(r'/home/yt/PycharmProjects/test/1minia.contigs.fa', 'r') as f:
seq = ''
for line in f:
seq = seq.strip('\n')
if line.startswith('>'):
seq = line
f1.write(seq)
else:
seqlist = list(line)
seqlist = seqlist[0:20]
seq = "".join(seqlist)
f1.write(seq+'\n')
f.close
5.按照序列长度/名字排序
fasta = {}
f1 = open('/home/yt/PycharmProjects/test/1minia.result5.txt', 'w')
f = open('/home/yt/PycharmProjects/test/1minia.contigs.fa', 'r')
sortseq = ''
for line in f:
if line.startswith('>'):
line = line.strip('\n')
name = line + '\n'
fasta[name] = ''
else:
fasta[name] = line
for key, value in fasta.items():
seqsort = sorted(fasta.items(), key=lambda i:len(i[1]))
for n in seqsort:
for t in n:
sortseq += str(t)
f1.write(sortseq)
6.RNA翻译为蛋白质
首先我们要建立一个dictionary将RNA codon table存进去,key为RNA 序列,value是对应的氨基酸。
import re
from collections import OrderedDict
codonTable = OrderedDict()
f1 = open('/home/yt/PycharmProjects/test/RNA.result','w')
f = open('/home/yt/PycharmProjects/test/RNA','r')
for line in f:
line = line.rstrip()
lst = re.split('\s+', line)
for i in [0, 2, 4, 6]:
codonTable[lst[i]] = lst[i + 1]
rnaSeq = ''
with open('/home/yt/PycharmProjects/test/RNA', 'rt') as f:
for line in f:
line = line.rstrip()
rnaSeq += line.upper()
aminoAcids = []
i = 0
while i < len(rnaSeq):
codon = rnaSeq[i:i + 3]
if codonTable[codon] != 'Stop':
aminoAcids.append(codonTable[codon])
i += 3
peptide = ''.join(aminoAcids)
f1.write(peptide)