MAKER 进行基因预测

maker结合了重复序列屏蔽,augustus,genemark,snap从头预测,同源注释以及RNA-seq的方法,其本身并不属于一种预测基因的方法,其为一套整合各个方法的流程。

1、安装

conda create -n maker -c bioconda maker

安装完成后,进行配置即可
或者不使用conda也可以安装

2、简单使用

获得3个配置文件

maker -CTL

而后对其进行配置即可

配置文件

  • maker_bopts.ctl 可不用修改
  • maker_exe.ctl 执行路径
  • maker_opts.ctl 重点修改对象

其中maker_opts.ctl参数大概如下:

#-----Genome (Required for De-Novo Annotation)
genome=/home/user/projects/thomas_the_train/assembly/scaffolds.fasta  
##genome sequence 参考基因组,绝对路径或者相对路径均可。scaffold N50 应该大于预期中位基因长度;序列中只能含有A,T,C,G,N.前面最好不要有空格
organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic
## 默认为真核生物

#-----Re-annotation Using MAKER Derived GFF3
## 一般不用调节

#-----EST Evidence (for best results provide a file for at least one)
# 根据转录组或者EST序列进行预测基因
est=/home/user/projects/thomas_the_train/trinity/funnel.fasta,../trinity/coaltender.fasta #non-redundant set of assembled ESTs in fasta format (classic EST analysis)
# 添加EST,或者RNA-seq组装的fasta序列,多个序列用‘,’隔开
altest= #EST/cDNA sequence file in fasta format from an alternate organism
# 如果没有任何转录组数据则可用上述命令
est_gff=../cufflinks/boiler.gff,../cufflinks/brake.gff #EST evidence from an external gff3 file
# 转录组所对应的GFF文件,可用cufflinks或stringtie获得
altest_gff= #Alternate organism EST evidence from a separate gff3 file

#-----Protein Homology Evidence (for best results provide a file for at least one)
# maker利用exonerate 进行获得基因模型,可以选取质量高的蛋白比如uniprot/swiss-prot,或者选取AED<0,5的蛋白
protein=../protein/swiss_prot.fasta  #protein sequence file in fasta format
protein_gff=  #protein homology evidence from an external gff3 file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=all #select a model organism for RepBase masking in RepeatMasker
# all 包括动物,草,灵长类等,你也可以选择特定物种,比如果蝇
rmlib=../repeat_lib/thomas_TEs.fasta #provide an organism specific repeat library in fasta format for RepeatMasker
# 重复序列库,可有RepeatModeler得到
repeat_protein=/Users/mcampbell/maker/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner
# 已知的TE
rm_gff= #repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to run repeat masking on prokaryotes (don't change this), 1 = yes, 0 = no
softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)

#-----Gene Prediction
# 选择基因模型进行预测,非常灵活
snaphmm=../trained_snap/thomas_1.hmm #SNAP HMM file
# 物种特有的或者来自临近物种,接受多个参数
gmhmm=../train_genemark/es.mod #GeneMark HMM file
# 也可接受多个参数
augustus_species=steam_tram #Augustus gene prediction species model
# 效果好,但是很难训练
fgenesh_par_file= #Fgenesh parameter file
pred_gff= #ab-initio predictions from an external GFF3 file
model_gff= #annotated gene models from an external GFF3 file (annotation pass-through)
# 高质量的基因模型
est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
# 如果没有基因预测模型,则可以启用。一般用于第一轮的预测,后续可以关闭
protein2genome=0 #gene prediction from protein homology, 1 = yes, 0 = no
# 和est2genome 一样

参考

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。