R语言数据可视化_科学统计图表1——ggplot绘制火山图

什么是火山图******?

火山图中每个点代表一个基因,常被用于展示差异,比如差异表达基因、差异微生物等等。

火山图的两轴分别表示基因表达值差异的显著性和差异的程度

(1)火山图的y轴是-log10(PValue),有时也可以利用PValue的校正值QValue,因此数值越高说明PValue越小,即差异越显著。

(2)横坐标是Log2(FD),即对Fold Change取log2。所以图中的点越靠两侧,基因表达量上调或者下调程度越大。

一般来说在差异基因分析过程中,我们通常将 PValue小于0.05且FD的绝对值大于2为差异基因。

当然在差异基因数量过多的时候,我们可以调节筛选标准,选择适当数量的差异表达基因投入后续的统计流程中。但是一般来讲,不会设置PValue的值大于0.05。


话不多说,代码如下


plot_Volcano_1 <- function(result,

                        logFC = 1.5,adj_P = 0.05,

                        label_geneset = NULL){

  result$change = "NONE"

  result$change[which(result$log2FoldChange > logFC & result$padj<adj_P)] = "UP"

  result$change[which(result$log2FoldChange < (-logFC) & result$padj<adj_P)] = "DOWN"

  xlim = max(abs(result$log2FoldChange))

  if(is.null(label_geneset)){

    p = ggplot(result,aes(x = log2FoldChange, y = -log10(padj)))+

      geom_point(data = result,aes( x = log2FoldChange, y = -log10(padj),color = change))+

      theme_bw()+

      geom_vline(xintercept = c(-logFC,logFC),lty = 2)+

      geom_hline(yintercept = c(-log10(adj_P)),lty = 2)+

      scale_x_continuous(limits = c(-xlim,xlim))+

      coord_fixed(ratio = ( 2*xlim )/(max(-log10(result$padj),na.rm = T))  )+

      theme(panel.grid = element_blank(),

            panel.grid.minor = element_blank(),

            axis.text = element_text(color = "black"))+

      xlab("log10FoldChange")+ylab("-log10P-value")+

      scale_color_manual(values = c("NONE" = "grey","UP" = "red","DOWN" = "blue"))

  }else{

    p = ggplot(result,aes(x = log2FoldChange, y = -log10(padj)))+

      geom_point(data = result,aes( x = log2FoldChange, y = -log10(padj),color = change))+

      geom_point(data = result[which(result$Row.names %in% label_geneset),],

                aes( x = log2FoldChange, y = -log10(padj)),color = "black",size = 4)+

      geom_point(data = result[which(result$Row.names %in% label_geneset),],

                aes( x = log2FoldChange, y = -log10(padj)),color = "white",size = 2.5)+

      geom_point(data = result[which(result$Row.names %in% label_geneset),],

                aes( x = log2FoldChange, y = -log10(padj),color = change),size = 1.5)+

      geom_text_repel(data = result[which(result$Row.names %in% label_geneset),],

                      aes( x = log2FoldChange, y = -log10(padj),label = Row.names),fontface = "italic")+

      theme_bw()+

      geom_vline(xintercept = c(-logFC,logFC),lty = 2)+

      geom_hline(yintercept = c(-log10(adj_P)),lty = 2)+

      scale_x_continuous(limits = c(-xlim,xlim))+

      coord_fixed(ratio = ( 2*xlim )/(max(-log10(result$padj),na.rm = T))  )+

      theme(panel.grid = element_blank(),

            panel.grid.minor = element_blank(),

            axis.text = element_text(color = "black"))+

      xlab("log10FoldChange")+ylab("-log10P-value")+

      scale_color_manual(values = c("UP" = "red","DOWN" = "blue","NONE" = "grey"))

  }

  return(p)

}

图片效果:

image.png

另一种当需要标记的基因较多的时候,火山图的常见格式


plot_Volcano_2 <- function(result,

                          logFC = 1.5,adj_P = 0.05,

                          label_geneset = NULL){

  result$change = "NONE"

  result$change[which(result$log2FoldChange > logFC & result$padj<adj_P)] = "UP"

  result$change[which(result$log2FoldChange < (-logFC) & result$padj<adj_P)] = "DOWN"

  xlim = max(abs(result$log2FoldChange))

  if(is.null(label_geneset)){

    p = ggplot(result,aes(x = log2FoldChange, y = -log10(padj)))+

      geom_point(data = result,aes( x = log2FoldChange, y = -log10(padj),color = change))+

      theme_bw()+

      geom_vline(xintercept = c(-logFC,logFC),lty = 2)+

      geom_hline(yintercept = c(-log10(adj_P)),lty = 2)+

      scale_x_continuous(limits = c(-xlim,xlim))+

      coord_fixed(ratio = ( 2*xlim )/(max(-log10(result$padj),na.rm = T))  )+

      theme(panel.grid = element_blank(),

            panel.grid.minor = element_blank(),

            axis.text = element_text(color = "black"))+

      xlab("log10FoldChange")+ylab("-log10P-value")+

      scale_color_manual(values = c("NONE" = "grey","UP" = "red","DOWN" = "blue"))

  }else{

    p = ggplot(result,aes(x = log2FoldChange, y = -log10(padj)))+

      geom_point(data = result,aes( x = log2FoldChange, y = -log10(padj),color = change))+

      geom_label_repel(data = result[which(result$Row.names %in% label_geneset),],

                      aes( x = log2FoldChange, y = -log10(padj),label = Row.names,fill = change),color = "white",fontface = "italic")+

      theme_bw()+

      geom_vline(xintercept = c(-logFC,logFC),lty = 2)+

      geom_hline(yintercept = c(-log10(adj_P)),lty = 2)+

      scale_x_continuous(limits = c(-xlim,xlim))+

      coord_fixed(ratio = ( 2*xlim )/(max(-log10(result$padj),na.rm = T))  )+

      theme(panel.grid = element_blank(),

            panel.grid.minor = element_blank(),

            axis.text = element_text(color = "black"))+

      xlab("log10FoldChange")+ylab("-log10P-value")+

      scale_color_manual(values = c("UP" = "red","DOWN" = "blue","NONE" = "grey"))+

      scale_fill_manual(values = c("UP" = "red","DOWN" = "blue","NONE" = "grey"))

  }

  return(p)

}

图片如下:

image.png

私信我,提供专业的生信服务!

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容