非度量多维尺度法(NMDS)是一种将多维空间的研究对象(样本或变量)简化到低维空间进行定位、分析和归类,同时又保留对象间原始关系的数据分析方法。
适用于无法获得研究对象间精确的相似性或相异性数据,仅能得到他们之间等级关系数据的情形。换句话说,当资料不适合直接进行变量型多维尺度分析时,对其进行变量变换,再采用变量型多维尺度分析。其特点是根据样品中包含的物种信息,以点的形式反映在多维空间上,而对不同样品间的差异程度,则是通过点与点间的距离体现的,最终获得样品的空间定位点图;
另外,针对分离度不好的样品,spider plot 可能会有奇效。
示例
Non-metric multidimensional scaling (NMDS) plot of β-diversity based on Bray-Curtis dissimilarities in the bacteriome of the piglet GI tract. Ellipses indicate 1 standard deviation from organ centroid and spiders are drawn to GI tract region centroid. Colors indicate GI tract region, symbols indicate litter, and ellipses line types indicate specific organs of the upper and lower GI.
脚本
数据样式:
- OTU丰度数据就是一般OTU表或注释后的OTU丰度表,每一行为一个OTU,每一列为一个样品。
- 分组数据为跟样品一一对应的分组数据。
Notice:利用OTU表做NMDS时,可以获得(样本+物种)两种得分信息,能够找到更多的潜在信息。
library(vegan)
library(ggplot2)
# data --------------------------------------------------------------------
set.seed(13)
otu <- matrix(sample(c(0:200), 1200, replace = TRUE),
ncol = 12, nrow = 100,
dimnames = list(
row_names = paste0("OTU", seq(1:100)),
col_names = paste0("sample", seq(1:12))
))
groups <- data.frame(
sample = paste0("sample", seq(1:12)),
sites = rep(c("root", "soil", "rhizosphere"), 4)
)
# hellinger-transform: square root of method = "total"
otu <- t(otu)
hell_otu <- decostand(otu, "hell")
# The number of points n should be n > 2*k + 1
# default k = 2
NMDS <- metaMDS(hell_otu, k = 4, distance = "bray")
# print NMDS
# stress 应该越小越好,通常阈值为0.2
NMDS
# Get Species or Site Scores from an Ordination
score_species <- scores(NMDS, display = "species")
score_sites <- scores(NMDS, display = "sites")
# get the stress value
stress <- round(NMDS$stress, 4)
# adds group date
# 有时groups中的sample 和score 的结果顺序不一样
# 推荐使用merge 或者 match 函数合并数据
plot_data <- cbind(as.data.frame(score_sites), groups)
# set data for spider plot ------------------------------------------------
# calculate the center of NMDS1, NMDS2, according to groups
cent <- aggregate(cbind(NMDS1, NMDS2) ~ sites,
data = plot_data, FUN = mean)
# summarise_if 有利于自动化
plot_data %>%
group_by(sites) %>%
summarise_if(is.numeric, mean)
cent <- setNames(cent, c("sites", "oNMDS1", "oNMDS2"))
spider_data <- merge(plot_data, cent, by = "sites", sort = FALSE)
# 设置坐标轴刻度label
theme <- function(){
list(scale_x_continuous(breaks = seq(from = -0.1,
to = 0.1,
by = 0.05),
labels = seq(from = -0.1,
to = 0.1,
by = 0.05)),
scale_y_continuous(breaks = seq(from = -0.1,
to = 0.1,
by = 0.05),
labels = seq(from = -0.1,
to = 0.1,
by = 0.05)))
}
# 可视化
ggplot(plot_data, aes(x = NMDS1, y = NMDS2,
color = sites)) +
geom_point() +
coord_fixed() +
stat_ellipse()
ggplot(spider_data, aes(x = NMDS1, y = NMDS2,
color = sites)) +
geom_segment(aes(xend = oNMDS1, yend = oNMDS2)) +
geom_point() +
geom_point()
NMDS结果评估
通常情况下我们可以根据应力值来判断排序模型的合理性,应力值(Stress)最好不要大于0.2。
此外,还可以通过goodness()和stressplot() {vegan}来评估NMDS拟合优度。
# Shepard图: 若R2越大,则NMDS结果越合理
stressplot(NMDS, main = "Shepard Plot")
gof <- goodness(NMDS1)
plot(NMDS, display = "sites", type = "t", main = "goodness of fit statistic")
points(NMDS, cex = gof * 200, col = "red")
Reference
Arfken AM, Frey JF, Ramsay TG, Summers KL. Yeasts of Burden: Exploring the Mycobiome-Bacteriome of the Piglet GI Tract. Front Microbiol. 2019 Oct 8;10:2286. doi: 10.3389/fmicb.2019.02286