本文转自 http://www.cnblogs.com/ZHshuang463508120/p/3593679.html
一、mirDeep2安装
下载和解压
wget http://mdc.helmholtz.de/38350089/en/research/research_teams/systems_biology_of_gene_regulatory_elements/projects/miRDeep/mirdeep2_0_0_5.zip
unzip mirdeep2_0_0_5.zip
如果用mirDeep2自带的install.pl安装会遇到下载的文件不存在的情况,比如bowtie
那么你需要自己安装几个软件。解压后的路径下面有个README里面详细介绍了如何自行安装mirdeep2。不过有些细节需要修改。
首先,下载几个必须的package,下载到/home/disk6/src路径下,解压也都在这个路径下完成
(ps:所有附带安装软件的网址,参照下载好的mirdeep2目录下的README)
bowtie #version 0.12.7
ViennaRNA-1.8.5.tar.gz
squid-1.9g.tar.gz
randfold-2.0.tar.gz
PDF-API2-0.73.tar.gz
perl #我的版本是 5.10.1
~~~~~~~~~~安装bowtie
unzip bowtie-0.12.7-linux-x86_64.zip
解压后就是可执行的二进制文件,不需要编译,省心啊
把bowtie加入环境变量
~~~~~~~~~安装ViennaRNA
tar -zxf ViennaRNA-1.8.5.tar.gz
cd ViennaRNA-1.8.5
./configure --prefix=/home/disk6/tools/ViennaRNA #/home/disk6/tools/是我安装软件的路径,我把常用的软件都安装到这里,或者建立ln -s到tools下面相应的目录,然后一个个放到path中
make
make install
~~~~~~~~~安装squid-1.9g.tar.gz和randfold-2.0.tar.gz
tar -zxf squid-1.9g.tar.gz
cd squid-1.9g
./configure --prefix=/home/disk6/tools/squid #只有configure之后才有squid.h文件,这是下面的randfold2.0需要的文件
make
make install
tar -zxf randfold-2.0.tar.gz
cd randfold2.0
编辑Makefile文件,将INCLUDE=-I这一行替换为INCLUDE=-I. -I/home/disk6/src/squid-1.9g/ -L/home/disk6/src/squid-1.9g/
make
将randfold加入path
~~~~~~~~~~~~安装PDF-API2-0.73.tar.gz
tar -zxf PDF-API2-0.73.tar.gz
cd PDF-API2-0.73
mkdir ../mirdeep2/lib/ #这个不能忘了,一开始就解压了mirdeep2,在mirdeep2下面创建一个lib路径
perl Makefile.PL PREFIX=/home/disk6/src/mirdeep2 LIB=/home/disk6/src/mirdeep2/lib
make
make test
make install #至此,/home/disk6/src/mirdeep2/lib下面已经有了两个目录PDF和x86_64-linux-thread-multi
~~~~~~~~~~~~配置mirdeep2的perl5lib 就是那个PDF了
在~/.bash_profile里面加入
export PERL5LIB=PERL5LIB:/home/disk6/src/mirdeep2/lib
~~~~~~~~~测试所有安装过的软件是否正常
to test if everything is installed properly type in
1) bowtie
2) RNAfold -h
3) randfold
4) make_html.pl
~~~~~~~~~~最后,在path中加入miRDeep2的路径
二、mirDeep2介绍
miRDeep2的文件夹下面有自带的tutorial,参考通过参考这个例子学习miRDeep2.
tutorial_dir文件夹里有下面几个文件,.fa为fasta格式。
cel_cluster.fa: # 研究物种的基因组文件
mature_ref_this_species.fa: # 研究物种的成熟miRNA文件,miRBase有下载
mature_ref_other_species.fa: # 其他物种相关的成熟miRNA文件,miRBase有下载
precursors_ref_this_species.fa: # 研究物种miRNA前体的文件,miRBase有下载
reads.fa: # deep sequencing reads
~~~~~~~~~~第一步~~~~~~~~~
# 利用bowtie-build建立基因组文件的index
bowtie-build cel_cluster.fa cel_cluster # cel_cluster.fa是基因组文件,cel_cluster是index文件的
前缀,这个前缀可以是任意的
# 字符,不一定要和基因组文件相同。
~~~~~~~~~~第二步~~~~~~~~~
# 处理reads文件并且把它map到基因上
perl mapper.pl reads.fa -c -j -k TCGTATGCCGTCTTCTGCTTGT -l 18 -m -p cel_cluster -s
reads_collapsed.fa -t reads_collapsed_vs_genome.arf -v
参数讲解
-c 指出输入文件是fasta格式,同类的参数还有-a(seq.txt format),-b(qseq.txt format),-e(fastq format),-d
(contig file)
-j 删除不规范的字母(不规范的字母是指除a,c,g,t,u,n,A,C,G,T,U,N之外的字母)
-k 剪切接头,后跟接头序列,例子中的TCGTATGCCGTCTTCTGCTTGT就是接头
-l 忽视小于某长度的序列,例子中忽视18nt长度的reads
-m collapses the reads
-p 将处理过的reads map到之前建立过索引的基因组上,例子中的cel_cluster
-s 指出将处理过的reads输出到某个文件,例子中将处理过的reads输出到reads_collapsed.fa
-t 指出将mapping的结果输出到某个文件,例子中将mapping后的结果输出到reads_collapsed_vs_genome.arf文件中
-v 在屏幕上显示处理的动作,加v和不加v的区别见附注1,明显看出来加v后屏幕不仅显示了一个处理后的summary,而
且显示了mapper的动作,如discarding,clipping,collapsing,trimming。不加v屏幕上只显示一个summary
例子中未使用的参数
处理/mapping参数
-g 给reads一个前缀,默认是seq。-s和-t两个输出文件中reads前面会多出seq三个字母。
-h parse to fasta format
-i 转换rna成dna(再map到基因组)convert rna to dna alphabet (to map against genome)
-q 种子序列中一个错配(mapping的时间会变长??)map with one mismatch in the seed (mapping takes
longer)
-r 允许在基因组上map到的最多的位置数,默认是5。也就是说最多map 5个位置
-u 不移除临时文件的路径
-n 覆盖已有文件
~~~~~~~~~~第三步~~~~~~~~~
# fast quantitation of reads mapping to known miRBase precursors.
(This step is not required for
identification of known and novel miRNAs in the deep sequencing data when using miRDeep2.pl.)
快速定量reads mapping到已知的miRNA前体。利用miRDeep.pl在deep sequencing数据中鉴定已知和未知的miRNA,这
一步不是必须的。
quantifier.pl -p precursors_ref_this_species.fa -m mature_ref_this_species.fa -r reads_collapsed.fa
-t cel -y 16_19
参数讲解
-p miRNA前体文件,miRBase可以下载
-m 成熟miRNA序列文件,miRBase可以下载
-r reads文件
-t 物种,可以指定某个物种,这样分析的时候只考虑某个物种的数据。也可以不指定,分析所有的
-y [time] optional otherwise its generating a new one
屏幕上显示的结果
getting samples and corresponding read numbers
seq 374333 reads
Converting input files
building bowtie index
mapping mature sequences against index
# reads processed: 174
# reads with at least one reported alignment: 6 (3.45%)
# reads that failed to align: 168 (96.55%)
Reported 6 alignments to 1 output stream(s)
mapping read sequences against index
# reads processed: 1505
# reads with at least one reported alignment: 1088 (72.29%)
# reads that failed to align: 417 (27.71%)
Reported 1099 alignments to 1 output stream(s)
analyzing data
6 mature mappings to precursors
Expressed miRNAs are written to expression_analyses/expression_analyses_16_19/miRNA_expressed.csv
not expressed miRNAs are written to
expression_analyses/expression_analyses_16_19/miRNA_not_expressed.csv
Creating miRBase.mrd file
after READS READ IN thing
make_html2.pl -q expression_analyses/expression_analyses_16_19/miRBase.mrd -k
mature_ref_this_species.fa -z -t C.elegans -y 16_19 -o -i
expression_analyses/expression_analyses_16_19/mature_ref_this_species_mapped.arf -l -m cel
miRNAs_expressed_all_samples_16_19.csv
miRNAs_expressed_all_samples_16_19.csv file with miRNA expression values
parsing miRBase.mrd file finished
creating PDF files
creating pdf for cel-mir-39 finished
creating pdf for cel-mir-40 finished
creating pdf for cel-mir-37 finished
creating pdf for cel-mir-36 finished
creating pdf for cel-mir-38 finished
creating pdf for cel-mir-41 finished
#
得到几个文件,expression_16_19.html,expression_analyses文件夹(里面有很多文件),
iRNAs_expressed_all_samples_16_19.csv
,pdfs_16_19文件夹
~~~~~~~~~~第四步~~~~~~~~~
#在deep sequencing data中鉴定已知和未知的miRNA
miRDeep2.pl reads_collapsed.fa cel_cluster.fa reads_collapsed_vs_genome.arf
mature_ref_this_species.fa mature_ref_other_species.fa precursors_ref_this_species.fa -t C.elegans
2> report.log
# reads_collapsed.fa是经过mapper.pl处理的reads。
# cel_cluster.fa是基因组文件
# reads_collapsed_vs_genome.arf mapping的结果
# mature_ref_this_species.fa研究物种的成熟miRNA文件,miRBase有下载
# mature_ref_other_species.fa其他物种相关的成熟miRNA文件,miRBase有下载
# precursors_ref_this_species.fa研究物种miRNA前体的文件,miRBase有下载
# 如果你只有reads,arf文件,genome文件,其他文件没有,需要这样表示miRNAs_ref/none miRNAs_other/none
precursors/none,本物种的成熟miRNA无,其他相关物种也无,更没有前体。
参数说明
-t 物种
2> repot.log表示将所有的步骤输出到report.log文件中
# 屏幕显示
#####################################
# #
# miRDeep2 #
# #
# last change: 07/07/2011 #
# #
#####################################
miRDeep2 started at 19:44:43
#Starting miRDeep2
#testing input files
#Quantitation of known miRNAs in data
#parsing genome mappings
#excising precursors
#preparing signature
#folding precursors
#computing randfold p-values
#running miRDeep core algorithm
#running permuted controls
#doing survey of accuracy
#producing graphic results
miRDeep runtime:
started: 19:44:43
ended: 19:46:15
total:0h:1m:32s
~~~~~~~~~~第五步~~~~~~~~~
# 浏览结果
用浏览器打开.html文件
注意,cel-miR-37预测了两次。因为这个位点的两个潜在的前体可以折叠成发卡结构。然而,注释的发卡结构得分远远
高于未注释的发卡结构(miRDeep2 score 6.1e+4 vs. -0.2)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~附注1~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
######加v###屏幕上输出的结果如下####
discarding sequences with non-canonical letters
clipping 3' adapters
discarding short reads
collapsing reads
mapping reads to genome index
# reads processed: 1609
# reads with at least one reported alignment: 470 (29.21%)
# reads that failed to align: 1139 (70.79%)
Reported 480 alignments to 1 output stream(s)
trimming unmapped nts in the 3' ends
######不加v###屏幕上输出的结果如下####
# reads processed: 1609
# reads with at least one reported alignment: 470 (29.21%)
# reads that failed to align: 1139 (70.79%)
Reported 480 alignments to 1 output stream(s)
~~~~~~~~~~~~~~附注1~~~~~~~~~~~~~~~~~~