绘图场景:在高通量检测数据中,我们在数据展示上往往都喜欢将感兴趣的功能或者通路基因绘制成热图形式,能够让读者一目了然的发现某功能是增强了?还是减弱了?在实际过程中,表型实验上已经证明了炎症反应的走向,从初期增强到后期减弱。但在对应的高通量数据中却不是如我们所想,趋势很凌乱,很难说炎症这个过程的增强减弱。因此在拿到数据后,需要有选择性的筛选和展示。今天就以项目为例,记录整个绘制过程。
1. 对促炎数据做聚类分析并提取所需信息
首先清除环境,安装并加载所需要的R包
rm(list = ls()) #清除环境内存
#install.packages("pheatmap") #安装pheatmap包
#install.packages("readxl") #安装readxl包
#install.packages("ggplot2") #安装ggplot2包
#install.packages(c('officer', 'rvg', 'flextable', 'stargazer', 'broom' ))#export的依赖包
#下载export,链接https://cran.r-project.org/src/contrib/Archive/export/export_0.2.2.tar.gz
#install.packages("D:/R-3.6.2/library/export_0.2.2.tar.gz", repos = NULL)
library(ggplot2)
library(pheatmap) #加载pheatmap包
library(readxl) #加载readxl包
library(export) #用于输出不同格式的结果
读入促炎数据并对数据做简单处理
data_pos<-read_excel("demo3.xlsx",1) #读入excel数据
data_pos<-as.data.frame(data_pos) # 将data转换为data.frame格式
rownames(data_pos)<-data_pos[,1] #将第一列数据设置为行名
data_pos<-data_pos[,-1] #去除重复的第一列
colnames(data_pos) = gsub("_expression","",colnames(data_pos))
head(data_pos)
## 12h 24h 3d 7d
## Il1rl1 1.0475045 0.9948235 1.0268230 1.010622
## Tnfsf4 0.9327795 1.0742431 0.9676329 1.010017
## Tnfsf18 0.8085097 0.8327313 1.0099118 1.020472
## Hspd1 0.9905324 0.9932249 0.9839949 1.002263
## Pla2g4a 1.0225494 0.9872852 1.0146842 1.002581
## Xcl1 0.9028666 0.9687647 0.9989017 1.065237
绘图查看基因聚类情况
A<-pheatmap(data_pos,fontsize_col=12,fontsize_row=8,
show_rownames=F,show_colnames = T,
cluster_row=T,cluster_col=F,scale="row",
treeheight_row=25,treeheight_col=25,
border_color=NA,main="Heatmap",
color = colorRampPalette(c("blue", "white", "red"))(50),
cutree_rows=3) # 画图
ggsave(A,filename = "demo3_pos.pdf",width=3,height=5)
ggsave(A,filename = "demo3_pos.png",width=3,height=5)
从上图发现,基因按照表达模式可分为3大类,其中1和2类是想要的数据。
那如何提取这部分数据呢?
首先将表达数据与聚类分类信息合并
clus_a=cutree(A$tree_row,3) #获取按照3块,得到聚类模块信息
table(clus_a)# 每个聚类模块的数量
## clus_a
## 1 2 3
## 12 17 31
clus_a<-as.data.frame(clus_a)#转化为data.frame格式
data_pos= cbind(data_pos, clus_a)# 按列合并数据
data_pos #查看合并后数据
## 12h 24h 3d 7d clus_a
## Il1rl1 1.0475045 0.9948235 1.0268230 1.0106219 1
## Tnfsf4 0.9327795 1.0742431 0.9676329 1.0100173 2
## Tnfsf18 0.8085097 0.8327313 1.0099118 1.0204721 3
## Hspd1 0.9905324 0.9932249 0.9839949 1.0022628 3
## Pla2g4a 1.0225494 0.9872852 1.0146842 1.0025812 1
## Xcl1 0.9028666 0.9687647 0.9989017 1.0652369 3
## Tgm2 0.9719927 0.9908516 0.9937439 1.0165441 3
## S100a8 1.1048567 0.9885955 1.1913848 0.9661311 1
## Ctss 1.0075379 0.9725514 1.0238447 1.0097870 3
## Adora3 0.8768417 0.9808549 0.9766739 1.0384855 3
## Fabp4 1.0056085 1.0287909 0.9618795 0.9613461 2
## Tlr2 0.9925440 1.0575861 0.9961778 0.9766016 2
## S100a9 0.9364502 0.9293324 0.9946467 1.0106743 3
## Tlr4 1.0503438 1.0367557 1.0205949 0.9283988 1
## Zp3 0.8793431 0.9968576 1.0090100 1.0340921 3
## Clock 1.0138973 1.0231884 1.0113094 0.9766799 2
## Trpv4 0.9513371 0.9818378 1.0161012 1.0326306 3
## Ccl26 0.9786242 0.9316714 1.0050418 1.0356912 3
## Ccl24 0.9361875 0.9389519 1.0271715 1.0666504 3
## Serpine1 0.9992854 1.0101892 0.9987391 0.9768985 2
## Il17ra 1.0549646 1.0201907 0.9829150 0.9061829 1
## Tnfrsf1a 1.0196453 1.0356289 0.9909272 0.9341439 2
## Pde2a 0.9950061 1.0437024 1.0036394 0.9912396 2
## Gprc5b 0.9359255 0.9440990 1.0133725 1.0411849 3
## Adam8 1.0053608 1.0149603 1.0010803 1.0082890 2
## Cx3cl1 0.9103365 0.9611052 0.9865766 1.0385095 3
## Tlr3 1.0712287 0.9860523 1.0268784 0.9320602 1
## Ednra 0.9760352 1.0376250 0.9664353 1.0616230 3
## Ldlr 0.9951658 1.0412401 1.0044728 0.9757518 2
## Ets1 0.9649667 0.9932523 0.9633367 1.0178563 3
## Tlr9 0.9845119 1.0433439 1.0179088 0.9331340 2
## Hyal2 0.9805355 0.9952188 0.9537844 1.0180491 3
## Ccr2 1.0781580 0.9166186 0.9737452 0.9468374 1
## Ccr5 1.0379514 1.0791485 0.9571004 0.9994896 2
## Cdk19 1.0834689 0.9774149 0.9731305 1.0011084 1
## Egfr 0.9594616 0.9865012 0.9968854 1.0231812 3
## Ccl2 0.9410630 0.9234708 0.9172620 1.0855231 3
## Ccl7 0.9377025 0.9959727 0.9830637 0.9999279 3
## Ccl11 0.9395817 1.0046657 0.9952031 1.1051408 3
## Ccl12 1.0210894 0.9428178 0.9676126 1.0958590 3
## Ccl4 0.9173600 1.0137382 0.9935696 0.8478784 2
## Stat5a 0.9984534 1.0020705 0.9927384 1.0171578 3
## Ace 0.9411769 0.9791180 0.9885292 1.0293273 3
## Tnip1 0.9967821 1.0371306 0.9868051 0.9751483 2
## Ccl1 0.9490765 1.0061028 1.0299452 1.0591937 3
## Ccl9 1.0748156 1.0255019 1.0143933 1.0110047 1
## Ccl6 0.9836699 1.0019130 0.9749013 1.0098487 3
## Ccl3 1.0027344 1.0298867 1.0158420 0.9723747 2
## Stat5b 1.0067589 1.0196183 0.9994107 0.9936060 2
## Prkca 1.0188645 0.9967514 0.9482353 0.9719247 1
## Itga2 0.9682602 0.9558409 0.9855012 1.0177017 3
## Wnt5a 0.9772947 0.9830877 0.9540308 1.0548477 3
## Il17rb 0.9341813 0.9980961 0.9743833 1.0585646 3
## Ptger4 0.9765908 1.0148698 1.0042148 0.9546091 2
## Cd47 0.9976905 0.9901815 1.0223011 1.0073487 3
## Ager 0.9438379 1.0014070 0.9860253 1.0298349 3
## Tnf 1.0131602 1.0011471 1.0135716 0.9635546 1
## Gpsm3 0.9960939 1.0271684 0.9634463 0.9898771 2
## Jak2 1.0512738 0.9872834 0.9996479 0.9795569 1
## Il33 1.0075621 0.9593182 0.9695374 1.0968678 3
按照聚类顺序将聚类1和2的数据提取出来
order_row = A$tree_row$order #记录热图的行排序
data_pos = data_pos[order_row,] # 按照热图的顺序,重新排原始数据
data_pos = data.frame(rownames(data_pos),data_pos,check.names =T) # 将行名加到表格数据中
data_pos =data_pos[(data_pos$clus %in% c("1","2")),]
#或者data_pos =data_pos[(data_pos$clus < 3),]
data_pos #查看数据
## rownames.data_pos. X12h X24h X3d X7d clus_a
## Prkca Prkca 1.0188645 0.9967514 0.9482353 0.9719247 1
## Ccr2 Ccr2 1.0781580 0.9166186 0.9737452 0.9468374 1
## Jak2 Jak2 1.0512738 0.9872834 0.9996479 0.9795569 1
## Cdk19 Cdk19 1.0834689 0.9774149 0.9731305 1.0011084 1
## Ccl9 Ccl9 1.0748156 1.0255019 1.0143933 1.0110047 1
## Tlr4 Tlr4 1.0503438 1.0367557 1.0205949 0.9283988 1
## Il17ra Il17ra 1.0549646 1.0201907 0.9829150 0.9061829 1
## Tlr3 Tlr3 1.0712287 0.9860523 1.0268784 0.9320602 1
## Tnf Tnf 1.0131602 1.0011471 1.0135716 0.9635546 1
## S100a8 S100a8 1.1048567 0.9885955 1.1913848 0.9661311 1
## Il1rl1 Il1rl1 1.0475045 0.9948235 1.0268230 1.0106219 1
## Pla2g4a Pla2g4a 1.0225494 0.9872852 1.0146842 1.0025812 1
## Tnfsf4 Tnfsf4 0.9327795 1.0742431 0.9676329 1.0100173 2
## Adam8 Adam8 1.0053608 1.0149603 1.0010803 1.0082890 2
## Ccr5 Ccr5 1.0379514 1.0791485 0.9571004 0.9994896 2
## Gpsm3 Gpsm3 0.9960939 1.0271684 0.9634463 0.9898771 2
## Ccl4 Ccl4 0.9173600 1.0137382 0.9935696 0.8478784 2
## Ptger4 Ptger4 0.9765908 1.0148698 1.0042148 0.9546091 2
## Tlr9 Tlr9 0.9845119 1.0433439 1.0179088 0.9331340 2
## Ccl3 Ccl3 1.0027344 1.0298867 1.0158420 0.9723747 2
## Ldlr Ldlr 0.9951658 1.0412401 1.0044728 0.9757518 2
## Tlr2 Tlr2 0.9925440 1.0575861 0.9961778 0.9766016 2
## Pde2a Pde2a 0.9950061 1.0437024 1.0036394 0.9912396 2
## Tnfrsf1a Tnfrsf1a 1.0196453 1.0356289 0.9909272 0.9341439 2
## Clock Clock 1.0138973 1.0231884 1.0113094 0.9766799 2
## Serpine1 Serpine1 0.9992854 1.0101892 0.9987391 0.9768985 2
## Fabp4 Fabp4 1.0056085 1.0287909 0.9618795 0.9613461 2
## Tnip1 Tnip1 0.9967821 1.0371306 0.9868051 0.9751483 2
## Stat5b Stat5b 1.0067589 1.0196183 0.9994107 0.9936060 2
data_pos_reorder=data_pos[,-6]
write.table(data_pos_reorder,file="data_pos_reorder.txt",row.names=FALSE,quote = FALSE,sep='\t') #输出结果,按照热图中的顺序
2. 对抑炎数据做聚类分析并提取所需信息
读入抑炎数据并对数据做简单处理
data_neg<-read_excel("demo3.xlsx",2) #读入excel数据
data_neg<-as.data.frame(data_neg) # 将data转换为data.frame格式
rownames(data_neg)<-data_neg[,1] #将第一列数据设置为行名
data_neg<-data_neg[,-1] #去除重复的第一列
colnames(data_neg) = gsub("_expression","",colnames(data_neg))
head(data_neg)
## 12h 24h 3d 7d
## Il10 0.8684374 0.9661305 0.9805500 1.053469
## Adora1 0.9303991 0.9886013 0.9663481 1.056422
## Il2ra 0.9800390 0.9831342 0.9467422 1.011937
## Tnfaip6 0.9267055 1.0009174 1.0224794 1.063077
## Tyro3 0.9413618 0.9496315 0.9760268 1.060393
## Gata3 0.9527483 0.9656051 0.9631235 1.045775
绘图查看基因聚类情况
B<-pheatmap(data_neg,fontsize_col=12,fontsize_row=8,
show_rownames=F,show_colnames = T,
cluster_row=T,cluster_col=F,scale="row",
treeheight_row=25,treeheight_col=25,
border_color=NA,main="Heatmap",
color = colorRampPalette(c("blue", "white", "red"))(50),
cutree_rows=2) # 画图
ggsave(B,filename = "demo3_neg.pdf",width=3,height=8)
ggsave(B,filename = "demo3_neg.png",width=3,height=8)
从上图发现,基因按照表达模式可分为2大类,其中1类是想要的数据。
那如何提取这部分数据呢?
首先将表达数据与聚类分类信息合并
clus_b=cutree(B$tree_row,2) #获取按照3块,得到聚类模块信息
table(clus_b)# 每个聚类模块的数量
## clus_b
## 1 2
## 52 30
clus_b<-as.data.frame(clus_b)
data_neg= cbind(data_neg, clus_b)
head(data_pos) #查看合并后数据
## rownames.data_pos. X12h X24h X3d X7d clus_a
## Prkca Prkca 1.018865 0.9967514 0.9482353 0.9719247 1
## Ccr2 Ccr2 1.078158 0.9166186 0.9737452 0.9468374 1
## Jak2 Jak2 1.051274 0.9872834 0.9996479 0.9795569 1
## Cdk19 Cdk19 1.083469 0.9774149 0.9731305 1.0011084 1
## Ccl9 Ccl9 1.074816 1.0255019 1.0143933 1.0110047 1
## Tlr4 Tlr4 1.050344 1.0367557 1.0205949 0.9283988 1
按照聚类顺序将聚类1和2的数据提取出来
order_row = B$tree_row$order #记录热图的行排序
data_neg = data_neg[order_row,] # 按照热图的顺序,重新排原始数据
data_neg = data.frame(rownames(data_neg),data_neg,check.names =T) # 将行名加到表格数据中
table(data_neg$clus)
##
## 1 2
## 52 30
data_neg =data_neg[(data_neg$clus %in% c("1")),]
#或者data_neg =data_neg[(data_neg$clus < 2),]
head(data_neg)
## rownames.data_neg. X12h X24h X3d X7d clus_b
## Chrna7 Chrna7 0.9867821 0.9545026 0.9598899 0.9981845 1
## Aoah Aoah 1.0066938 0.9492302 0.9263789 1.0097112 1
## Pbk Pbk 1.0071100 0.9945398 1.0009174 1.0369612 1
## Il2 Il2 0.9414986 0.9277887 0.9402898 1.1531456 1
## Apoe Apoe 0.9638917 0.9554340 0.9667258 1.0191987 1
## Apoa1 Apoa1 0.9878721 0.9745789 0.9888851 1.0551538 1
data_neg_reorder=data_neg[,-6]
write.table(data_neg_reorder,file="data_neg_reorder.txt",row.names=FALSE,quote = FALSE,sep='\t') #输出结果,按照热图中的顺序
3. 限定pos组数据的颜色,统一颜色标尺
首席清除环境,加载R包并读入上游筛选出的促炎数据
rm(list = ls()) #清除环境内存
library(pheatmap) #加载pheatmap包
library(readxl) #加载readxl包
data=read.table('data_pos_reorder.txt',sep = '\t',stringsAsFactors = F,header = T)
data<-as.data.frame(data) # 将data转换为data.frame格式
rownames(data)<-data[,1] #将第一列数据设置为行名
data<-data[,-1] #去除重复的第一列
colnames(data) = gsub("X","",colnames(data))
限定颜范围
ys <- c(seq(-1,-0.01,by=0.01),seq(0,1,by=0.01))#设定颜色范围数
绘图(促炎热图)
p1=pheatmap(data,fontsize_col=8,fontsize_row=7,
show_rownames=T,show_colnames = T,
cluster_row=T,cluster_col=F,scale="row",
treeheight_row=25,treeheight_col=25,
border_color=NA,main="Heatmap",
color = c(colorRampPalette(colors = c("blue","white"))(length(ys)/2),colorRampPalette(colors = c("white","red"))(length(ys)/2)),
legend_breaks=seq(-1,1,0.5),
breaks=ys)
ggsave(p1,filename = "demo3_1.pdf",width=3,height=5)
ggsave(p1,filename = "demo3_1.png",width=3,height=5)
4. 限定neg组数据的颜色,统一颜色标尺
首席清除环境,加载R包并读入上游筛选出的促炎数据
rm(list = ls()) #清除环境内存
library(pheatmap) #加载pheatmap包
library(readxl) #加载readxl包
data=read.table('data_neg_reorder.txt',sep = '\t',stringsAsFactors = F,header = T)
data<-as.data.frame(data) # 将data转换为data.frame格式
rownames(data)<-data[,1] #将第一列数据设置为行名
data<-data[,-1] #去除重复的第一列
colnames(data) = gsub("X","",colnames(data))
限定颜范围
ys <- c(seq(-1,-0.01,by=0.01),seq(0,1,by=0.01))#设定颜色范围数
绘图(抑炎热图)
p2=pheatmap(data,fontsize_col=8,fontsize_row=7,
show_rownames=T,show_colnames = T,
cluster_row=T,cluster_col=F,scale="row",
treeheight_row=25,treeheight_col=25,
border_color=NA,main="Heatmap",
color = c(colorRampPalette(colors = c("blue","white"))(length(ys)/2),colorRampPalette(colors = c("white","red"))(length(ys)/2)),
legend_breaks=seq(-1,1,0.5),
breaks=ys)
ggsave(p2,filename = "demo3_2.pdf",width=3,height=8)
ggsave(p2,filename = "demo3_2.png",width=3,height=8)
5. 合并两张图片
最后可以用AI组合两个结果,也许R包也可以,还没有学。。。成图如下:整体体现出促炎反应随着时间减弱,抑炎随着时间增强,两个功能呈现此消彼长的趋势,符合实验表型。
显示运行环境
sessionInfo()
## R version 3.6.2 (2019-12-12)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=Chinese (Simplified)_China.936
## [2] LC_CTYPE=Chinese (Simplified)_China.936
## [3] LC_MONETARY=Chinese (Simplified)_China.936
## [4] LC_NUMERIC=C
## [5] LC_TIME=Chinese (Simplified)_China.936
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] export_0.2.2 readxl_1.3.1 pheatmap_1.0.12 ggplot2_3.2.1
##
## loaded via a namespace (and not attached):
## [1] rgl_0.100.50 Rcpp_1.0.3 lattice_0.20-38
## [4] tidyr_1.0.2 assertthat_0.2.1 digest_0.6.24
## [7] mime_0.9 R6_2.4.1 cellranger_1.1.0
## [10] backports_1.1.5 evaluate_0.14 pillar_1.4.3
## [13] gdtools_0.2.1 rlang_0.4.4 lazyeval_0.2.2
## [16] uuid_0.1-4 miniUI_0.1.1.1 data.table_1.12.8
## [19] flextable_0.5.9 rmarkdown_2.1 webshot_0.5.2
## [22] stringr_1.4.0 htmlwidgets_1.5.1 munsell_0.5.0
## [25] shiny_1.4.0 broom_0.5.5 compiler_3.6.2
## [28] httpuv_1.5.2 xfun_0.12 pkgconfig_2.0.3
## [31] systemfonts_0.1.1 base64enc_0.1-3 rvg_0.2.4
## [34] htmltools_0.4.0 tidyselect_1.0.0 tibble_2.1.3
## [37] crayon_1.3.4 dplyr_0.8.4 withr_2.1.2
## [40] later_1.0.0 grid_3.6.2 nlme_3.1-142
## [43] jsonlite_1.6.1 xtable_1.8-4 gtable_0.3.0
## [46] lifecycle_0.1.0 magrittr_1.5 scales_1.1.0
## [49] zip_2.0.4 stringi_1.4.6 promises_1.1.0
## [52] xml2_1.2.2 vctrs_0.2.3 generics_0.0.2
## [55] openxlsx_4.1.4 stargazer_5.2.2 RColorBrewer_1.1-2
## [58] tools_3.6.2 manipulateWidget_0.10.1 glue_1.3.1
## [61] officer_0.3.8 purrr_0.3.3 crosstalk_1.0.0
## [64] fastmap_1.0.1 yaml_2.2.1 colorspace_1.4-1
## [67] knitr_1.28
往期回顾
R绘图|ggplot2散点图的绘制
R绘图|pheatmap热图绘制——基础篇
R绘图|pheatmap热图绘制——中阶篇
今天的内容就到这里~~,更多内容可关注公共号“YJY技能修炼”~~