What
RepeatMasker是一款基于Library-based,通过相似性比对来识别重复序列,可以屏蔽序列中转座子重复序列和低复杂度序列(默认将其替换成N),几乎用于所有物种,是做基因组、非编码RNA的必备软件。在人类基因组分析当中,大约 56% 的序列会被mask;RepeatMasker在进行序列比对时可以选用常见的几种算法,包括nhmmer、cross_match、ABBlast/WUBlast、RMBlast 、Decypher(可以安装多个比对引擎,但每次只能使用其中一个)。
Repbase是由美国遗传信息研究所(GIRI)创建并维护,收录了转座子和其他重复序列及其注释信息。
RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). Currently over 56% of human genomic sequence is identified and masked by the program. Sequence comparisons in RepeatMasker are performed by one of several popular search engines including nhmmer, cross_match, ABBlast/WUBlast, RMBlast and Decypher. RepeatMasker makes use of curated libraries of repeats and currently supports Dfam ( profile HMM library derived from Repbase sequences ) and Repbase, a service of the Genetic Information Research Institute.
在线服务
- RepeatMasker提供了在线服务,将核酸序列或者FASTA文件上传,选择比对程序、速度/特异性、物种以及结果呈现形式,点击提交,几分钟之后即可得到结果,实乃一大利器。
- Search Engine
- abblast
- rmblast
- hmmer
- cross_match
- Speed/Sensitivity
- rush
- quick
- default
- slow
- DNA source
- Human
- Mouse
- Arabidopsis
本地安装RepeatMasker
本地安装RepeatMasker,除了需要RepeatMasker主程序外,还需要TRF(Tandem Repeats Finder)、序列搜索引擎(以RMBlast为例)以及Repbase数据库。
wget http://tandem.bu.edu/trf/downloads/trf407b.linux
sudo mv trf407b.linux /usr/local/bin/trf # 记住这个地址1
sudo /usr/local/bin/trf
- RMBlast
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/rmblast/2.2.28/ncbi-rmblastn-2.2.28-src.tar.gz
tar -zvcf ncbi-rmblastn-2.2.28-src.tar.gz
cd ncbi-rmblastn-2.2.28-src/c++
./configure --with-mt --prefix=/usr/local/rmblast --without-debug
make
sudo make install
# 记住安装RMBlast的地址2, */ncbi-rmblastn-2.2.28-src/c++/GCC480-ReleaseMT64/bin
Repbase
这个需要在官网注册才能下载,其中商业机构需要收费,非营利性组织可以免费使用,人工审批!也可以Google、百度上找资源,下载后解压备用。RepeatMasker
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-6.tar.gz
cd RepeatMasker
perl configure
<PRESS ENTER TO CONTINUE> # 回车继续
Enter path [ ]: # 输入perl程序路径
Enter path [ ]: # 输入RepeatMasker要安装的路径
Enter path [ ]: # 输入TRF路径(地址1)
Add a Search Engine: # 选择一个搜索引擎(需要事先安装好),并输入引擎路径(地址2)
1. CrossMatch: [ Un-configured ]
2. RMBlast - NCBI Blast with RepeatMasker extensions: [ Un-configured ]
3. WUBlast/ABBlast (required by DupMasker): [ Un-configured ]
4. HMMER3.1 & DFAM: [ Un-configured ]
5. Done
Do you want RMBlast to be your default # 设置默认搜索引擎
search engine for Repeatmasker? (Y/N) [ Y ]:
# 可以安装多个引擎,完成后按5
Congratulations! RepeatMasker is now ready to use. # 提示已经安装完成
# RepeatMasker已经安装完成,下一步将之前下载解压的Repbase文件COPY到RepeatMasker安装路径下的Libraries文件夹中即可
- Simple ues
RepeatMasker -species human test.fa