0. 背景知识
非编码RNA介绍——百度词条
miRNA的命名规则:
详细介绍见https://mp.weixin.qq.com/s/SnrUB8v0_Mzlg2f4nHju3Q
1. 下载序列
Ensembl Plants
Favourite genomes中可以直接找到拟南芥
下载基因组序列及注释,和genes, cDNAs, ncRNA等序列。
Arabidopsis_thaliana.TAIR10.42.gff3.gz
Arabidopsis_thaliana.TAIR10.cdna.abinitio.fa.gz
Arabidopsis_thaliana.TAIR10.cdna.all.fa.gz
Arabidopsis_thaliana.TAIR10.cds.all.fa.gz
Arabidopsis_thaliana.TAIR10.dna.toplevel.fa
Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.gz
Arabidopsis_thaliana.TAIR10.ncrna.fa.gz
Arabidopsis_thaliana.TAIR10.pep.abinitio.fa.gz
Arabidopsis_thaliana.TAIR10.pep.all.fa.gz
$ less Arabidopsis_thaliana.TAIR10.42.gff3.gz | grep "^#" -v | cut -f3 | sort | uniq -c
286067 CDS 7 chromosome
313952 exon 56384 five_prime_UTR
27655 gene 3879 lnc_RNA
325 miRNA 48359 mRNA
377 ncRNA 5178 ncRNA_gene
15 rRNA 287 snoRNA
82 snRNA 48308 three_prime_UTR
689 tRNA
2. 非编码RNA数据库
2.1 PNRD
中国农业大学开发,http://structuralbiology.cau.edu.cn/PNRD/download.php
数据方便下载,但是似乎没有更新了。
下载拟南芥所有非编码RNA序列和靶标信息。
$ ll
total 24M
-rw-r--r-- 1 huangsiyuan grp3 2.9M Jan 19 11:41 lncRNA.txt
-rw-r--r-- 1 huangsiyuan grp3 64K Jan 19 11:40 miRNA.txt
-rw-r--r-- 1 huangsiyuan grp3 26K Jan 19 11:41 snoRNA.txt
-rw-r--r-- 1 huangsiyuan grp3 8.5K Jan 19 11:42 snRNA.txt
-rw-r--r-- 1 huangsiyuan grp3 233K Jan 19 11:40 stem_loop.txt
-rw-r--r-- 1 huangsiyuan grp3 21M Jan 19 11:44 tar.txt
-rw-r--r-- 1 huangsiyuan grp3 948 Jan 19 11:42 tasiRNA.txt
-rw-r--r-- 1 huangsiyuan grp3 67K Jan 19 11:42 tRNA.txt
看看自己下载的序列
$ head -n 6 miRNA.txt #成熟的miRNA序列
>ath-miR156a
UGACAGAAGAGAGUGAGCAC
>ath-miR156b
UGACAGAAGAGAGUGAGCAC
>ath-miR156c
UGACAGAAGAGAGUGAGCAC
huangsiyuan 13:57:16 ~/learn_rnaseq/srna_project/ref_ncrna
$ head -n 6 stem_loop.txt #茎环结构序列
>ath-MIR156a
CAAGAGAAACGCAAAGAAACUGACAGAAGAGAGUGAGCACACAAAGGCAAUUUGCAUAUCAUUGCACUUGCUUCUCUUGCGUGCUCACUGCUCUUUCUGUCAGAUUCCGGUGCUGAUCUCUUU
>ath-MIR156b
GCUAGAAGAGGGAGAGAUGGUGAUUGAGGAAUGCAACAGAGAAAACUGACAGAAGAGAGUGAGCACAUGCAGGCACUGUUAUGUGUCUAUAACUUUGCGUGUGCGUGCUCACCUCUCUUUCUGUCAGUUGCCUAUCUCUGCCUGCUUGACCUCUCUCUCUCUCUCUCUCUCUCAAAUUUGGCU
>ath-MIR156c
CGCAUAGAAACUGACAGAAGAGAGUGAGCACACAAAGGCACUUUGCAUGUUCGAUGCAUUUGCUUCUCUUGCGUGCUCACUGCUCUAUCUGUCAGAUUCCGGCU
两个文件的顺序是对应的
茎环序列不一样也能产生高度相似的成熟miRNA
2.2 miRBase
the microRNA
database,这个数据库microRNA最全,实时更新,可以查询茎环序列、成熟miRNA序列、靶标序列,但是似乎不能直接下载下来,需要根据它提供的gff3文件转换得到序列。
http://www.mirbase.org/
如何查询miRNA的前体结构或序列?
1
2
3
加工后,茎环两条臂上的序列:
4