rosalind练习题十七

# Problem

# For positive integers a and n, a modulo n (written amodn in shorthand) is the remainder when a is divided by n. For example, 29mod11=7 because 29=11×2+7.

# Modular arithmetic is the study of addition, subtraction, multiplication, and division with respect to the modulo operation. We say that a and b are congruent modulo n if amodn=bmodn; in this case, we use the notation a≡bmodn.

# Two useful facts in modular arithmetic are that if a≡bmodn and c≡dmodn, then a+c≡b+dmodn and a×c≡b×dmodn. To check your understanding of these rules, you may wish to verify these relationships for a=29, b=73, c=10, d=32, and n=11.

# As you will see in this exercise, some Rosalind problems will ask for a (very large) integer solution modulo a smaller number to avoid the computational pitfalls that arise with storing such large numbers.

#

# Given: A protein string of length at most 1000 aa.

# Return: The total number of different RNA strings from which the protein could have been translated, modulo 1,000,000. (Don't neglect the importance of the stop codon in protein translation.)

#

# Sample Dataset

# MA

# Sample Output

# 12

# 这道题目是关于模运算和蛋白质翻译的问题。给定一个不超过1000个氨基酸(字母“F”,“L”,“I”,“M”,“V”,“S”,“P”,“T”,“A”,“Y”,“H”,“Q”,“N”,“K”,“D”,“E”,“C”,“W”,“R”,“G”的字符串),求可以翻译成该蛋白质的不同RNA序列数目,答案对1,000,000取模。

# 简单介绍一下蛋白质翻译。RNA是由四种碱基(腺嘌呤、尿嘧啶、胞嘧啶和鸟嘌呤)组成的。氨基酸则由RNA上的三个碱基编码而来,其中有一个“起始密码子”(AUG),还有三个“停止密码子”(UAA,UAG和UGA),分别表示蛋白质链的开端和结束。因此,一个蛋白质可以用RAN序列编码,一个RNA序列也可能被翻译成不同的蛋白质。

# 这里需要用到模运算。模运算是一种基本的整数运算,它给出了除法的余数。例如a mod n(记作amodn)表示a除以n的余数。当比较大的整数进行计算时,可以使用模运算来简化问题。如果两个整数在模n时相等,则它们被称为模n同余。

# 这道题目的思路如下:

# 准备好DNA中20个氨基酸对应的RNA序列表格;

# 对于给定的蛋白质序列,根据RNA序列表格找到所有可能的RNA序列组合;

# 注意考虑到终止密码子;

# 对于所有可能的RNA序列,计算它们的总数,并将结果对1,000,000取模。

# 计算RNA序列数

def count_rna_strings(protein_string):

    # 翻译表

    codon_table = {

        "UUU": "F", "CUU": "L", "AUU": "I", "GUU": "V",

        "UUC": "F", "CUC": "L", "AUC": "I", "GUC": "V",

        "UUA": "L", "CUA": "L", "AUA": "I", "GUA": "V",

        "UUG": "L", "CUG": "L", "AUG": "M", "GUG": "V",

        "UCU": "S", "CCU": "P", "ACU": "T", "GCU": "A",

        "UCC": "S", "CCC": "P", "ACC": "T", "GCC": "A",

        "UCA": "S", "CCA": "P", "ACA": "T", "GCA": "A",

        "UCG": "S", "CCG": "P", "ACG": "T", "GCG": "A",

        "UAU": "Y", "CAU": "H", "AAU": "N", "GAU": "D",

        "UAC": "Y", "CAC": "H", "AAC": "N", "GAC": "D",

        "UAA": "*", "CAA": "Q", "AAA": "K", "GAA": "E",

        "UAG": "*", "CAG": "Q", "AAG": "K", "GAG": "E",

        "UGU": "C", "CGU": "R", "AGU": "S", "GGU": "G",

        "UGC": "C", "CGC": "R", "AGC": "S", "GGC": "G",

        "UGA": "*", "CGA": "R", "AGA": "R", "GGA": "G",

        "UGG": "W", "CGG": "R", "AGG": "R", "GGG": "G"

    }

    # 统计RNA序列数量

    rna_count = 1  # 起始为1,包含终止密码子

    for aa in protein_string:

        codons_count = list(codon_table.values()).count(aa)

        rna_count = (rna_count * codons_count) % 1000000

    return rna_count * 3  # 乘上三个终止密码子

# 给出蛋白质序列

protein_string = "MA"

# 输出结果

print(count_rna_strings(protein_string))

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容