使用Linux版的MEGA构建某一基因家族的基因进化树

最近碰到了个需求,让我构建某一基因家族的进化树,并根据进化关系进行相应的分类。这让我想起了之前上课的时候,一个做进化的老师给我们讲过,如果不是纯做进化方向的课题,MEGA完全够用了。由于windows的内存等有限,做几十个基因还凑合,要是上百个基因就吃不消了,于是就想到了用Linux下的MEGA来做。

1.下载链接

https://www.megasoftware.net/
由于是二进制文件,直接解压缩,添加到环境变量就可以用了。

2.获取该基因家族

具体请看我这篇文章。https://www.jianshu.com/p/5fd60c818651

3.进行多序列比对

上一步我得到了该基因家族的所有基因家族的蛋白序列,然后我用windows下的MEGA的muscle算法进行了比较,【align-build alignment-上一步的基因家族蛋白序列-muscle比对-data-export-FASTA format】
最终我得到了比对后的multiproteins.fasta文件。

4.Linux下MEGA建树

首先进行参数的解读,相比于其他软件,我觉得这款软件比较好理解,也容易上手。
这里的.mao文件尤为重要,较为简单的方法是拿到windows下去设置,具体请看组学大讲堂的这篇推送。
https://www.omicsclass.com/article/568

版本信息

MEGA version 10.1.8
For 64-bit Linux
Build 10200331

Usage: megacc -a /pathTo/analysisFile.mao -d /pathTo/dataFile.meg [-t /pathTo/treeFile.nwk][OPTIONS]

参数解读

   -a --analysisOptions
MEGA Analysis Options File     *required*
       Specify the full path to the Mega Analysis Options (.mao) file. 
       This file tells MEGA-CC which analysis to perform as well as which options to use
#校准文件
   -c --calibration
 Calibration file *optional*
       Specify the full path to a calibration file that you wish to use. The calibration 
       file is used to provide calibration data for tree timing methods. 
   -ca --concatenate-alignments
       Command to concatenate multiple sequence alignments into a single sequence alignment file
       The files to be concatenated should either be located in a directory specified by the -d option
       or the -l option can be used to specify the files to concatenate via a text file that has the
       full path to each file on a separate line
   -d --data 
       Data File         *required*
       Specify the full  or relative path to the data file you wish to 
       analyze.  MEGA (.meg), and Fasta files are supported for 
       all analyses. For distance matrices the MEGA (.meg) format is required.
   -f --format *applies to sequence alignment only*
       Export format for sequence alignment
       Sequence alignments can be exported in either the native .meg
       or FASTA format.
       Format values:
         MEGA
         Fasta
   -g --groups 
       Groups file *optional*
       Specify the full path to the groups file that you wish to use. This file organizes 
       taxa into groups where each line in the file is a key value pair of the form 
             taxonName=groupName 
       Group information is used for certain analyses, for instance, specifying which taxon/taxa 
       comprise the outgroup for the timetree analysis 

   -gs --gap-symbol
       The character that represents indels in the sequence data file that is being analyzed
       If this is provided, it will override the value that is provided in the .mao file

   -h --help 
       Help
       Prints this help file document

   -is --identical-base-symbol
       The character that represents identical bases in the sequence data file that is being analyzed
       If this is provided, it will override the value that is provided in the .mao file
#如果有多个单序列文件的话,需要加这个选项。
   -l --list
       Input File List
       Specifies a text file which has a list of input data files to be analyzed.  This option can be  used
       instead of -d or -t to specify input data, in which case, the same analysis will be performed on all
       input files listed in the text file and each output results files will be named using  the  name  of
       its corresponding input file.  The indicated text file must be formatted such that each line has the
       full path to the sequence data file to be used and if a tree file is also provided it is on the same
       line but separated by a two pipe characters (e.g. || ).  See EXAMPLES and LIST FORMAT below.

   -ms --missing-base-symbol
       The character that represents missing bases in the sequence data file that is being analyzed
       If this is provided, it will override the value that is provided in the .mao file

   -n --noSummary: Do not write out the analysis summary file
       By default a file that gives an analysis summary is written.
       This option suppresses the export of that file. However, if any important messages are generated by the application, they will be written to this file regardless.

   -o --outfile     *optional*
       Output Path / Output Dir
       Specify the full path and base filename (e.g. /myResultsDirectory/myResultName) or
       simply the full path and directory of where to save the file
       (e.g. /myResultsDirectory) in which case, a unique filename will be chosen automatically for you.
   -pfc --partition-frequency-cutoff
       Partition Frequency Cutoff (a value between 0.0 and 1.0 - default is 0.5) *optional*
       When bootstrapping is used for tree construction a list of partitions and frequencies is written to a text file. The partition frequency cutoff causes any partitions whose frequency is less than the cutoff value to be ommited from this text file. Set this value to 0.0 to include all partitions.
   -r --recursive
       Recursive directory search *optional*
       If a directory is specified for analysis by default MEGA only searches
       the contents of that folder and not any of it's children.  To include the
       contents of all folders under the one specified, use this option.

   -s --silent: Do not write out the progress updates
       This option prevents progress updates from being written to stdout.

   -t --tree     *required for some analyses*
       Tree File
       Specify the full path to the tree file you wish to use. (Some analyses requires a user provided tree, or allow you to provide your own)

  If no output path is specified, results will be saved in the same directory
  as the input data file, with a unique name.

EXAMPLES

This example performs a multiple sequence alignment on codons (it assumes that you have created the file "Clustal_Codon_Alignment.mao"using the prototyper (megaproto). A fasta file with coding data is used as input and the resulting alignment is output in the MEGA format:

  megacc -a ~/Documents/Clustal_Codon_Alignment.mao -d ~/Documents/codingData.fas -o ~/Documents/codingDataAligned.meg

This example shows how to construct a neighbor-joining phylogeny for each of a list of sequence data files.
The analysis will be performed for each file listed in "listOfDataFiles.txt" and all results will be written to
the ~/Documents/outputDirectory/ directory:
megacc -a ~/Documents/NJ_Tree_Settings.mao -l ~/Documents/listOfDataFiles.txt -o ~/Documents/outputDirectory/

LIST FORMAT
When using the -l option, each file to be analyzed must be on its own line. For example:
~/Documents/myData/seqData1.fas
~/Documents/myData/seqData2.fas
~/Documents/myData/seqData3.fas

If the analyses are to use a user-provided Newick tree file, then the tree files are given on the same line as the data files, following two pipe characters. For example:
~/Documents/myData/seqData1.fas || ~/Documents/myData/treeFile1.nwk
~/Documents/myData/seqData2.fas || ~/Documents/myData/treeFile2.nwk
~/Documents/myData/seqData3.fas || ~/Documents/myData/treeFile3.nwk

我的最终使用:

time nohup megacc -a infer_ML_amino_acid.mao -d multiproteins.fas -t multiproteins.nwk -o ./

下一步我打算用ggtree来美化,具体学习情况,我再更新。

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 215,133评论 6 497
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,682评论 3 390
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 160,784评论 0 350
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,508评论 1 288
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,603评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,607评论 1 293
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,604评论 3 415
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,359评论 0 270
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,805评论 1 307
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,121评论 2 330
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,280评论 1 344
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,959评论 5 339
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,588评论 3 322
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,206评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,442评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,193评论 2 367
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,144评论 2 352