前面写了16S rDNA进化树。关于细菌基因组水平,下面记录这篇文章用了三种不同的流程reconstruct phylogenetic structure的策略。
标题:Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle
杂志:CELL
时间:2019
策略一(Figure 1)
PhyloPhlAn软件
400 universal PhyloPhlAn markers构建phylogeny
phylophlan参数:--diversity high --accurate --min_num_markers 80
Internal steps:
diamond:
blastx --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0
diamond:
blastp --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0
mafft --anysymbol 对齐
trimal -gappyout 修剪
RAxML -m PROTCATLG -p 1989 建树
策略二(Figure S3)
PhyloPhlAn软件
400 PhyloPhlAn markers reconstruct phylogeny
phylophlan参数:--diversity high --fast --min_num_markers 80
Internal steps:
diamond:
blastx --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0
diamond:
blastp --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0
mafft --anysymbol 对齐
trimal -gappyout 修剪
RAxML -m PROTCATLG -p 1989 建树
IQ-TREE -nt AUTO -m LG 建树
策略三(Figure 3)
Roary identified set of cores genes at 95%
roary -e -n -v -p 4 -i 95 \
-f ./result_roary/ \
./out/*.gff
PhyloPhlAn --diversity low --fast
--min_num_markers <50% of the number of core genes identified>
--min_num_entries <90% of the number of input genomes>
--diversity {low,medium,high}
Specify the expected diversity of the phylogeny,
automatically adjust some parameters: "low": for
genus-/species-/strain-level phylogenies; "medium":
for class-/order-level phylogenies; "high": for
phylum-/tree-of-life size phylogenies (default: None)
--fast Perform more a faster phylogeny reconstruction by
reducing the phylogenetic positions to use; affected
parameters depend on the "--diversity" level (default:
False)
--min_num_markers MIN_NUM_MARKERS
Input genomes or proteomes that map to less than the
specified number of markers will be discarded
(default: 1)
--min_num_entries MIN_NUM_ENTRIES
The minimum number of entries to be present for each
of the markers in the database (default: 4)
blastn -outfmt 6 -max_target_seqs 1000000
mafft --anysymbol --auto 对齐
trimal -gappyout 修剪
FastTree -mlacc 2 -slownni -spr 4 -fastest -mlnni 4 -no2nd -gtr -nt 建树
RAxML -p 1989 -m GTRCAT
-t <phylogenetic tree computed by FastTree>
NMDS基于 Roary 遗传距离
The non-metric multidimensional scaling plots
were computed on pairwise genetic distances between core gene alignments produced by Roary
using the nmds function in the ecodist R package
可视化
The phylogenetic trees were generated using GraPhlAn and the phylogenies were generated using FigTree
还有更多的方法
文章:Insights on the Evolutionary Genomics of the Blautia Genus: Potential New Species and Genetic Content Among Lineages
杂志:Frontiers in Microbiology
时间:2021
策略四
OrthoFinder获取conserved gene families (Orthogroups)
perl retrieve protein sequence
MAFFT (L-INS-i mode)对齐Orthogroups
ModelTest-NG:
Akaike information criterion (AIC)
IQ-TREE 2:
1000 replicates of ultrafast bootstrap
UFBOOT trees by NNI (–bnni)
SH-like approximate likelihood ratio test (–alrt)
策略五
panX core genome to construct single nucleotide polymorphism (SNP)-based tree
cophylo比较phylogenomic aminoacid and the SNP-based trees
TypeMat from the Microbial Genomes Atlas (MiGA) 进行细菌分类剔除anomalous classification