1.46版本phyloseq
熟悉微生物群落生态分析的大概率都会用过phyloseq包,昨天处理数据的时候,发现主函数的一个比较有意思的地方——物种丰度表-谱系树-元文件不对应依然可以构建phyloseq-class。记得在以前的版本里面,物种丰度表、谱系树、元文件会有比较严格的对应关系(名字、顺序等等),否则构建phyloseq-class对象时就会报错 。
查看了帮助文件:One or more component objects among the set of classes defined by the phyloseq package, as well as phylo-class (defined by the ape-package). Each argument should be a different class. For combining multiple components of the same class, or multiple phyloseq-class objects, use the merge_phyloseq function. Unlike in earlier versions, the arguments to phyloseq do not need to be named, and the order of the arguments does not matter.
本机安装的是1.46.0版本的phyloseq。

以包自带的数据集为例,数据集esophagus是包含58个物种,3个样本的phyloseq-class对象,也包含了一棵谱系树。注意,<u>修改了esophagus丰度表格(样本数减少1,物种数减少1)之后, phyloseq主函数依然能够顺利构建phyloseq-class对象——函数没有报错。**</u>
> data(esophagus)
> esophagus
phyloseq-class experiment-level object
otu_table() OTU Table: [ 58 taxa and 3 samples ]
phy_tree() Phylogenetic Tree: [ 58 tips and 57 internal nodes ]
> x1 = phyloseq(otu_table(esophagus)[-1,-1], phy_tree(esophagus))
> identical(x1, esophagus)
[1] FALSE
# 去掉第一个样本和第一个物种,与谱系树应该是不对应的。
> dim(otu_table(esophagus)[-1,-1])
[1] 57 2
> phy_tree(esophagus)
Phylogenetic tree with 58 tips and 57 internal nodes.
Tip labels:
59_8_22, 59_5_13, 59_8_12, 65_3_22, 65_5_1, 65_1_10, ...
Rooted; includes branch lengths.
但是,其实新版本的phyloseq函数会对相应的数据集取“交集”,即删除了多余的样本或者从发育树上剔除掉多余的tip。如下所示,我们重新提取x1的谱系树,就会发现物种少了一个(即第一个物种“59_8_22”)。
> phy_tree(x1)
Phylogenetic tree with 57 tips and 56 internal nodes.
Tip labels:
59_5_13, 59_8_12, 65_3_22, 65_5_1, 65_1_10, 65_7_12, ...
Rooted; includes branch lengths.
> phy_tree(x1)$tip.label
[1] "59_5_13" "59_8_12" "65_3_22" "65_5_1" "65_1_10" "65_7_12" "59_6_1" "65_2_17" "65_9_26"
[10] "65_5_18" "65_7_5" "65_8_12" "65_9_1" "59_9_26" "9_6_28" "65_7_4" "65_4_26" "9_2_24"
[19] "65_1_17" "65_6_7" "9_4_3" "65_7_18" "65_8_7" "59_4_5" "59_5_2" "65_6_2" "65_4_10"
[28] "9_7_25" "9_4_5" "65_1_8" "65_2_5" "59_7_6" "59_9_31" "9_1_7" "9_4_13" "9_6_3"
[37] "65_4_5" "59_9_18" "65_9_13" "65_3_18" "65_6_10" "9_7_18" "65_8_25" "59_3_5" "65_8_29"
[46] "65_4_20" "59_6_11" "59_9_17" "9_4_6" "59_4_25" "59_3_21" "59_3_19" "59_4_16" "59_8_3"
[55] "59_5_19" "65_9_9" "59_2_6"
> row.names(otu_table(esophagus))
[1] "59_8_22" "59_5_13" "59_8_12" "65_3_22" "65_5_1" "65_1_10" "65_7_12" "59_6_1" "65_2_17"
[10] "65_9_26" "65_5_18" "65_7_5" "65_8_12" "65_9_1" "59_9_26" "9_6_28" "65_7_4" "65_4_26"
[19] "9_2_24" "65_1_17" "65_6_7" "9_4_3" "65_7_18" "65_8_7" "59_4_5" "59_5_2" "65_6_2"
[28] "65_4_10" "9_7_25" "9_4_5" "65_1_8" "65_2_5" "59_7_6" "59_9_31" "9_1_7" "9_4_13"
[37] "9_6_3" "65_4_5" "59_9_18" "65_9_13" "65_3_18" "65_6_10" "9_7_18" "65_8_25" "59_3_5"
[46] "65_8_29" "65_4_20" "59_6_11" "59_9_17" "9_4_6" "59_4_25" "59_3_21" "59_3_19" "59_4_16"
[55] "59_8_3" "59_5_19" "65_9_9" "59_2_6"
phyloseq-class对象的构建条件放宽,确实方便了很多,再也不会出现让人的崩溃的报错。但是,假如丰度表的样本名和metadata的名字如果存在差异,就可能存在信息丢失(取“交集”),毕竟样本量太大的话也会存在出错的情况(当然,如果你是细节控,挨个check的话,就忽略)。另一方面,后期多样性多变量分析时候,其实还是需要把几个表格的顺序对应起来,否则分析成图的时候,依然需要重新处理。所以,个人还是倾向于,在构建phyloseq-class对象之前,check物种丰度表-谱系树-元文件的对应。
1.50版本phyloseq
> sessionInfo()
R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)
other attached packages:
[1] remotes_2.5.0 phyloseq_1.50.0 BiocManager_1.30.25
> data(esophagus)
> esophagus
phyloseq-class experiment-level object
otu_table() OTU Table: [ 58 taxa and 3 samples ]
phy_tree() Phylogenetic Tree: [ 58 tips and 57 internal nodes ]
> x1 = phyloseq(otu_table(esophagus)[-1,-1], phy_tree(esophagus))
> identical(x1, esophagus)
[1] FALSE
1.30.0版本phyloseq
> data(esophagus)
> x1 = phyloseq(otu_table(esophagus)[-1,-1], phy_tree(esophagus))
> identical(x1, esophagus)
[1] FALSE
> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22631)
other attached packages:
[1] phyloseq_1.30.0 BiocManager_1.30.25 openxlsx_4.1.5
ps. 看了下在1.30.0就发生变化了,不知道是不是最早开始变的版本 :dog: :dog: :dog: