曼哈顿图展示差异OTUs

简介

曼哈顿图(Manhattan Plot)本质上是散点图，一般用于展示大量非零的波动数据，散点在y轴的高度突出其属性异于其他低点:最早应用于全基因组关联分析(GWAS)研究中，y轴高点显示出具有强相关性的位点。

在微生物组中，曼哈顿图在展示差异OTUs上下调情况、差异OTUs归属情况等中有比较好的效果。一般来说，在扩增子中的曼哈顿图包括：

X轴：按OTUs不同分类学水平的字母顺序排列，一般不展示OTU编号标签：如果想以门水平着色，则建议按门——纲——目进行排序。
Y轴：-log10(Pvalue)；由于Pvalue的范围是从0-1，并且我们希望它越小越好，但直接展示时位于最下方，不符合直觉认知；因此，对Pvalue进行-log10转换是非常好的方法，此时高显著性(Pvalue趋近零)的值被高位显示，独立于其他值。
水平线：一般表示显著性水平阈值，方便直观了解每个点的显著性水平；
散点色彩：一般用于展示分类学水平；
散点形状：一般用于展示富集情况;
散点大小：一般用于展示相对丰度值；

注：该版本为初稿，后期将更新至MicrobiomeStatPlot

实例解读

例1. 曼哈顿图展示差异富OTUs分布情况¹

knitr::include_graphics('曼哈顿图展示差异富OTUs分布情况.png')

image.png

图1. 曼哈顿图展示差异富OTUs分布情况(Rafal Zgadzaj et al., 2016)

图片描述：

结果总体描述：我们根据OTUs的分类情况(Phylum, Class, Order)对群落变化进行了剖析，并通过曼哈顿图展示了野生型和突变体在根或根际的富集情况.
图1结果描述(Fig. 5 C and D)：与WT相比，突变株中显著富集的OTUs的数量和多样性都显著增加(实心点大小和数量均增加)。
元素解释：
- …respect to root：根内菌为统计对照，计算富集的OTUs；
- 散点：代表单个OTU；
- 散点大小：相对丰度；
- 散点形状：显著富集实心圆点，否则为圆环；
- 灰色背景：间隔每个目水平(或强调是否存在显著富集OTUs)；
- 水平线：显著性水平p = 0.05；

Manhattan plots showing root-enriched OTUs in WT (A) or in the mutants (B) with respect to rhizosphere and rhizosphere-enriched OTUs in WT (C) or in the mutants (D) with respect to root. OTUs that are significantly enriched (also with respect to soil) are depicted as full circles. The dashed line corresponds to the false discovery rate-corrected P value threshold of significance (α = 0.05). The color of each dot represents the different taxonomic affiliation of the OTUs (order level), and the size corresponds to their RAs in the respective samples [WT root samples (A), mutant root samples (B), WT rhizosphere samples (C), and mutant rhizosphere samples (D)]. Gray boxes are used to denote the different taxonomic groups (order level).

注意事项：

此处数据通过每个OTU的总体相对丰度 > 0.05%筛选；

例2. 曼哈顿图展示籼稻或粳稻中富集的OTUs²

knitr::include_graphics('曼哈顿图展示籼稻或粳稻中富集的OTUs.png')

image.png

图2. 曼哈顿图展示籼稻或粳稻中富集的OTUs(Jingying Zhang et al., 2019)

图片描述：

总体结果描述：首先，我们利用曼哈顿图对OTUs进行富集分析。
- 在籼稻中富集门水平包括：Acidobacteria, Proteobacteria, Actinobacteria, Bacteroidetes, Chloroflexi, Firmicutes and Verrucomicrobia;
- 在粳稻中富集门水平主要是：Proteobacteria, Bacteroidetes and Firmicutes;
- 并且发现其了重叠的富集区域;
元素解释：
- 每个点或三角形代表一个OTU；
- 点的形状：indica中富集OTU为实心三角形；japonica中富集OTU为空心三角形；实心点表明无显著富集；
- OTUs按照门的分类字母顺序排列和着色，变形杆菌则按照纲排列；
- 与x平行虚线：应该是显著富集的OTU中最小的-log10(Pvalue);

Manhattan plot showing OTUs enriched in indica or japonica in field I (a) and field II (b). Each dot or triangle represents a single OTU. OTUs enriched in indica or japonica are represented by filled or empty triangles, respectively (FDR adjusted P < 0.05, Wilcoxon rank sum test). OTUs are arranged in taxonomic order and colored according to the phylum or, for Proteobacteria, the class. CPM, counts per million.

例3. 曼哈顿图展示三萜突变体对微生物组的调控作用³

knitr::include_graphics('曼哈顿图展示三萜突变体对微生物组的调控作用.png')

image.png

图3. 曼哈顿图展示三萜突变体对微生物组的调控作用(Ancheng C. Huang et al., 2019)

图片描述：

元素解释：
- 每个点代表一个OTU；
- 散点形状：显著富集——实心上三角；显著下调——空心下三角；
- OTUs按物种分类水平字母排序，并按门或纲水平(个别)着色；

Manhattan plots showing modulation of OTUs in the microbiota of triterpene mutants. Manhattan plot showing enriched and depleted OTUs. Each dot or triangle represents a single OTU. OTUs enriched or depleted in mutants are shown with filled or empty triangles (P < 0.05), respectively. OTUs are arranged according to taxonomy and colored at the phylum or class levels. CPM represents counts per million.

最近发表的其他文章^4,5也用到曼哈顿图对富集OTUs进行了展示，图片元素基本与上述三个例子相同。

绘图实战

为了便于后期添加元素，此处将数据处理和可视化分开，使用时可以随时检查，便于使用。

环境设置

本教程需要在R语言环境下运行，推荐在RStudio界面中学习。目前测试版本为：Windows 7环境，R 3.6.3和 RStudio 1.2.5033。理论上Mac、Linux系统，以及R或RStudio的更新版本是兼容的，但并没有广泛测试，有问题欢迎自行解决并在群在与同行分享经验。

按需求安装，没必要每次都运行该安装代码。

if (!requireNamespace("tidyverse", quietly = TRUE))
  install.packages("tidyverse")

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")

if (!requireNamespace("edgeR", quietly = TRUE))
  BiocManager::install("edgeR")

数据处理函数介绍

otu_table数据源：MicrobiomeStatPlot项目中的示例数据。

加载需要用的包

library(edgeR)
library(tidyverse)

enrichment_analyses参数介绍：
- otu: otu_table
- design: 实验设计信息，样品信息的列名设置为samples，分组信息列名设置为group
- taxonomy_table: OTUs的分类水平表
- threshold: 相对丰度阈值，希望丢弃(获取)的相对丰度大小
- contrasts: 指定对比矩阵(“KO-WT”：表示KO相对于WT显著富集的OTUs)

enrichment_analyses <- function(otu_table, design, taxonomy_table, threshold, contrasts){

  ## filtering relative abundance by thresholding
  otu_relative <- apply(otu_table, 2, function(x){x/sum(x)})
  if (missing(threshold))
    threshold = 0.0005
  idx <- rowSums(otu_relative > threshold) >= 1
  otu <- as.data.frame(otu_table[idx, ])
  otu_relative <- as.data.frame(otu_relative[idx, ])

  ## construct a DGEList
  dge_list <- DGEList(counts = otu, group = design$group)

  ## Remove the lower abundance(In this case, it's usually useless)
  keep <- rowSums(dge_list$counts) >= 0
  dge_keep <- dge_list[keep, ,keep.lib.sizes = FALSE]

  ## scale the raw library sizes dgelist
  dge <- calcNormFactors(dge_keep)

  ## set the design_mat
  design_mat <- model.matrix(~0 + dge$samples$group)
  colnames(design_mat) <- gsub("([dge$samples$group])", 
                               "", colnames(design_mat))
  ## fit the GLM
  GLMC = estimateGLMCommonDisp(dge, design_mat)
  GLMT = estimateGLMTagwiseDisp(GLMC, design_mat)
  fit = glmFit(GLMT, design_mat)

  ## conducts likelihood ratio tests
  contrast_mat <- makeContrasts(
    contrasts = contrasts, 
    levels = colnames(design_mat))

  ## Fit a negative binomial generalized log-linear model to the read counts
  lrt = glmLRT(fit, contrast = contrast_mat)

  ## Multiple Testing
  de_lrt <- decideTestsDGE(lrt, adjust.method = "fdr", p.value = 0.05)

  ## extract values
  data <- lrt$table
  data$sign_level <- de_lrt@.Data

  ## enrichment status
  data$enrichment <- as.factor(ifelse(data$sign_level == 1, "enriched", ifelse(data$sign_level == -1, "depleted","nosig")))
  ## get the out's names
  data$otus <- rownames(data)

  ## negative logarithmes transformation
  data$neglog_p = -log(data$PValue)

  ## remove low foldchange
  idx <- data$logFC < 0
  data$neglog_p[idx] <- 0

  ## reorder OTUs according to taxonomy(Phylum, Class, Order)
  taxonomy <- taxonomy_table[order(taxonomy_table[, 3], 
                                   taxonomy_table[, 4], 
                                   taxonomy_table[, 5]), ]

  idx <- taxonomy[, 1] %in% data$otu
  taxonomy <- taxonomy[idx, ]

  idx <- match(taxonomy[, 1], data$otu)
  data <- data[idx, ]

  data$classification  <- taxonomy[, 3]
  data$otu <- factor(data$otu, levels = data$otu)

  ## calculating the relative abundance 
  ra <- apply(otu_relative, 1, mean)
  data$ra <- ra[match(data$otu, names(ra))]

  return(data)
}

threshold_line 参数介绍：
- Pvalue：期望的显著性P值得阈值(0.01, 0.05)
- data：enrichment_analyses 计算结果

threshold_line <- function(data, Pvalue){
  if (missing(Pvalue)){
    FDR <- min(data$neglog_p[data$enrichment == "enriched"])}
  else {
    FDR <- -log(Pvalue)}
 return(FDR) 
}

basic_theme:包含基本作图主题参数

basic_theme <- theme(panel.background = element_blank(),
                     panel.grid = element_blank(),
                     axis.ticks.x = element_blank(),
                     axis.text.x = element_blank(),
                     axis.line = element_line(size = 1),
                     legend.background = element_blank(),
                     legend.key = element_blank(),
                     legend.position = "top")

数据处理示例

## file_otu <-  "...\\otutab.txt"
## file_tax <- "...\\taxonomy.txt"
## loading data
otu <- read.delim("otutab.txt", header = TRUE, 
                  sep = "\t", row.names = 1,
                  stringsAsFactors = FALSE)

taxonomy <- read.delim("taxonomy.txt", header = TRUE, 
                       sep = "\t", 
                       stringsAsFactors = FALSE)

design <- data.frame(
  samples = colnames(otu),
  group = rep(c("KO", "WT", "OE" ),each = 6))

## Running enrichment_analyses

data <- enrichment_analyses(otu_table = otu, design = design, taxonomy_table = taxonomy, contrasts = "KO-WT")

FDR <- threshold_line(data = data)

ggpplot2 可视化示例

ggplot(data, aes(x = otu, y = neglog_p, color = classification, size = ra, shape = enrichment)) +
  geom_point(alpha = 0.8) +
  geom_hline(yintercept = FDR, linetype = 2, color = "red") +
  scale_shape_manual(values=c("enriched" = 17, 
                                "depleted" = 25, 
                                "nosig" = 20))+
  labs(x="OTUs", y = expression(~-log[10](P))) +
  guides(colour = "none") +
  basic_theme

image.png

图4. 曼哈顿图展示富集OTUs

间隔灰色设置

rect_coordinates:
- 计算灰度注释时，需要先设定x索引
- data：enrichment_analyses 计算结果

rect_coordinates <- function(data){
  manhattan_rect <- subset(data, select = c(classification,
                                            x_index))

  rect_coordinates <- manhattan_rect %>% 
    group_by(classification) %>% 
    summarise(xmin = min(x_index),
              xmax = max(x_index))
  return(rect_coordinates)}

ggplot2 可视化

data$x_index <- 1:nrow(data)
rect_coordinates <- rect_coordinates(data = data)

p <- ggplot(data, aes(x = x_index, y =  neglog_p))

for (i in 1:(nrow(rect_coordinates))) 
  p <- p + annotate('rect', 
                    xmin = rect_coordinates$xmin[i], 
                    xmax = rect_coordinates$xmax[i], 
                    ymin = 0, 
                    ymax = Inf,
                    fill = ifelse(i %% 2 == 0, 
                                  'gray50', 
                                  'gray95'))

p + geom_point(aes(color = classification, 
               size = ra, shape = enrichment)) + 
  geom_hline(yintercept = FDR, linetype = 2, color = "red") +
  scale_shape_manual(values=c("enriched" = 17, 
                              "depleted" = 25, 
                              "nosig" = 20))+
  labs(x="OTUs", y = expression(~-log[10](P))) +
  scale_x_continuous(expand = c(0, 0)) +
  guides(colour = "none") +
  basic_theme

image.png

图5. 灰区间隔的曼哈顿图展示富集OTUs

丰度排序着色

top_class:
- 计算灰度注释时，需要先设定x索引
- data：enrichment_analyses 计算结果
- top: 需要突出的丰度排名

top_class <- function(data, top){
  manhattan_rect <- subset(data, select = c(classification,
                                            x_index, 
                                            enrichment, 
                                            ra))
  rect_coordinates <- manhattan_rect %>% 
    group_by(classification) %>% 
    summarise(xmin = min(x_index),
              xmax = max(x_index),
              center = (xmin + xmax)/2)

  enriched_data <- subset(manhattan_rect, enrichment == "enriched")

  enriched_data %>% 
    group_by(classification) %>% 
    summarise(sum_ra = sum(ra)) -> enriched_ra             

  fill_data <- enriched_ra[order(enriched_ra$sum_ra, decreasing = TRUE), 1]

  fill_data <- as.data.frame(fill_data[c(1:top),])

  class_name <- merge(fill_data, rect_coordinates, by = "classification",
                      sort = FALSE)
  class_name <- class_name[c(1:top), ]

  reslut <- list(rect_coordinates = rect_coordinates, 
         fill_data = unlist(fill_data), 
         class_name = class_name)
  return(reslut)
}

ggplot 可视化

将显著富集的OTUs 按总相对丰度排序，用较深灰色突出并注明分类水平名称。

data$x_index <- 1:nrow(data)
reslut <- top_class(data = data, top = 3)

rect_coordinates <- reslut$rect_coordinates
fill_data <- reslut$ fill_data
class_name<- reslut$class_name

p <- ggplot(data, aes(x = x_index, y=  neglog_p))

for (i in 1:(nrow(rect_coordinates))) 
  p <- p + annotate('rect', 
                    xmin = rect_coordinates$xmin[i], 
                    xmax = rect_coordinates$xmax[i], 
                    ymin = 0, 
                    ymax = ceiling(max(data$neglog_p)/5)*5,
                    fill = ifelse(rect_coordinates$classification[i] %in% fill_data, 
                                  'gray65', 
                                  'gray85'))

p + geom_point(aes(color = classification, size = ra, 
                   shape = enrichment)) +
  geom_hline(yintercept=FDR, linetype = 2, color = "red") +
  scale_shape_manual(values=c("enriched" = 17, 
                              "depleted" = 25, 
                              "nosig" = 20))+
  labs(x="OTUs", y="-log10(P)") +
  guides(colour = "none") +
  basic_theme + 
  scale_x_continuous(expand = c(0, 0)) +
  coord_cartesian(clip = "off") +
  geom_text(data = class_name, aes(x = center,
                                   y = Inf,
                                   label = classification))

image.png

图6. 深灰色突出高丰度富集OTUs

1. Zgadzaj, R. et al. Root nodule symbiosis in lotus japonicus drives the establishment of distinctive rhizosphere, root, and nodule bacterial communities. Proc Natl Acad Sci U S A 113, E7996–E8005 (2016).

2. Zhang, J. et al. NRT1.1B is associated with root microbiota composition and nitrogen use in field-grown rice. Nat Biotechnol 37, 676–684 (2019).

3. Huang, A. C. et al. A specialized metabolic network selectively modulates arabidopsis root microbiota. Science 364, (2019).

4. Wang, J. et al. Arsenic concentrations, diversity and co-occurrence patterns of bacterial and fungal communities in the feces of mice under sub-chronic arsenic exposure through food. Environ Int 138, 105600 (2020).

5. Zhu, M. et al. Synchronous response in methanogenesis and anaerobic degradation of pentachlorophenol in flooded soil. J Hazard Mater 374, 258–266 (2019).

曼哈顿图展示差异OTUs

简介

实例解读

例1. 曼哈顿图展示差异富OTUs分布情况1

例2. 曼哈顿图展示籼稻或粳稻中富集的OTUs2

例3. 曼哈顿图展示三萜突变体对微生物组的调控作用3

绘图实战

环境设置

数据处理函数介绍

数据处理示例

ggpplot2 可视化示例

间隔灰色设置

丰度排序着色

例1. 曼哈顿图展示差异富OTUs分布情况¹

例2. 曼哈顿图展示籼稻或粳稻中富集的OTUs²

例3. 曼哈顿图展示三萜突变体对微生物组的调控作用³