导读
barrnap预测细菌基因组16S,获取结果文件第二行最长16S,保留>1400bp的16S。mafft做序列对齐。trimal修剪序列。fasttree建树。ggtree可视化。figtree也可简要查看tree文件。下面从整理好的16S序列文件开始。
文章
标题:MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability
期刊:Molecular Biology and Evolution
时间:2013
被引:21366 (谷歌学术 2021.11.24)
标题:trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses
期刊:Bioinformatics
时间:2009
被引:4662(谷歌学术 2021.11.24)
标题:FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments
期刊:Plos one
时间:2010
被引:7840(谷歌学术 2021.11.24)
介绍
FastTree 是基于最大似然法构建进化树的软件,它最大的特点就是运行速度快,支持几百万条序列的建树任务。但是fasttree不支持bootstrap检验以及支持的替换模型有限。
官网如下:http://www.microbesonline.org/fasttree/
替换模型选择:
FastTree 支持核酸和蛋白的进化树构建,对于核酸,可选的替换模型包括以下几种:JC(Jukes-Cantor)、GTR(generalized time-reversible),默认的模型为JC。对于蛋白质,可选的替换模型包括以下几种:JTT (Jones-Taylor-Thornton 1992)、LG(Le and Gascuel 2008)、WAG(Whelan & Goldman 2001) 默认的模型为JTT。FastTree要求输入的多序列比对结果为FASTA或者Phylip格式。
来自:https://www.omicsclass.com/article/1343
软件获取
conda create -n xgene
conda activate xgene
conda install mafft trimal fasttree
Downloading and Extracting Packages
mafft-7.487 | 3.0 MB | ##################################### | 100%
trimal-1.4.1 | 189 KB | ##################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
1 准备输入文件
2 mafft 对齐序列
windows版本:https://www.jianshu.com/p/d61cf5861e65
mafft --auto bgi_illumina_16S.fa > bgi_illumina_16S_align.fa
过程:
generating a scoring matrix for nucleotide (dist=200)
Making a distance matrix
Constructing a UPGMA tree (efffree=0) ...
Progressive alignment 1/2...
Making a distance matrix from msa..
Constructing a UPGMA tree (efffree=1)
Progressive alignment 2/2...
disttbfast (nuc) Version 7.487
generating a scoring matrix for nucleotide (dist=200)
dndpre (nuc) Version 7.487
generating a scoring matrix for nucleotide (dist=200)
dvtditr (nuc) Version 7.487
结果:
3 trimal 修剪
官网:http://trimal.cgenomics.org/
下载:http://trimal.cgenomics.org/downloads
手册:http://trimal.cgenomics.org/getting_started_with_trimal_v1.2
trimal \
-in bgi_illumina_16S_align.fa \
-out bgi_illumina_16S_align_filter.fa \
-automated1
# Use a heuristic selection of the automatic method based on similarity statistics. (see User Guide). (Optimized for Maximum Likelihood phylogenetic tree reconstruction).
4 fasttree建树
官网?:https://bioinformaticsworkbook.org/phylogenetics/FastTree.html#gsc.tab=0
官网?工作流:sequences -> MAFFT -> FastTree -> FigTree -> pdf
fasttree \
-nt bgi_illumina_16S_align_filter.fa \
> bgi_illumina_16S_align_filter.tree
过程:
FastTree Version 2.1.10 Double precision (No SSE3)
Nucleotide distances: Jukes-Cantor Joins: balanced Support: SH-like 1000
Search: Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1
TopHits: 1.00*sqrtN close=default refresh=0.80
ML Model: Jukes-Cantor, CAT approximation with 20 rate categories
5 ggtree 可视化
library("ggplot2")
library("ggtree")
data = read.tree("bgi_illumina_16S_align_filter.tree")
tree = fortify(data)
gra3 =
ggtree(data, layout="fan", branch.length="none", size=0.8) %<+% pal +
geom_tiplab(aes(label = mark, col = Platform),
size=3) +
scale_color_manual(
values = c("BGI" = "orangered3",
"Illumina" = "deepskyblue3"))
ggsave(gra3, file="tree3.jpg")
iqtree建树
mafft --auto Tree.fas > Tree.fas.mafft
iqtree -s Tree.fas.mafft -m MFP -bb 1000 -bnni -redo -o NC_010433