clusterProfiler: universal enrichment tool for functional and comparative study(Y叔)
1 总览
2 介绍
2.1 术语
2.1.1 gene sets and pathway
- gene set:无序但功能相关
- pathway:有序且功能相关,不考虑上下游关系可认为是一个gene set
2.1.2 Gene Ontology (GO)
GO定义基因功能的概念和类别以及 概念间的相互关系,包括三种
- molecular function(MF):基因产物的分子功能
- cellular component(CC)
- biological process(BP):多种基因产物参与的通路或者复杂过程
GO术语被组织在一个有向无环图中,术语之间的边表示父子关系。
3 功能富集分析方法
3.1 Over Representation Analysis
Over Representation Analysis (ORA) (Boyle et al. 2004) is a widely used approach to determine whether known biological functions or processes are over-represented (= enriched) in an experimentally-derived gene list, e.g. a list of differentially expressed genes (DEGs).
3.2 Gene Set Enrichment Analysis(GSEA)
不人为设定阈值,利用到每个基因的数据,以基因集形式比较在gene sets中的分布,以发现比较小但是较为一致的改变。
包含三种主要元素
- Calculation of an Enrichment Score.
ER分数代表多大程度上分析的基因集合S在ranked list L的顶部或者底部富集。在遍历集合listL中,如果发现了存在于S的基因increasing a running-sum statistic ,不在于S中,则下降。 - Esimation of Significance Level of ES. (p value)
- Adjustment for Multiple Hypothesis Testing. (q value)
GSEA算法有两种,“DOSE”以及“FGSEA”。DOSE中有by=参数设置,默认为“DOSE”。
3.3 Leading edge analysis and core enriched genes
Leading edge analysis reports Tags to indicate the percentage of genes contributing to the enrichment score, List to indicate where in the list the enrichment score is attained and Signal for enrichment signal strength.
类似于分析hub gene?
4 通用的富集分析
clusterProfiler支持许多oncology/通路的超几何检验以及基因集富集分析,此外,其还提供enricher
功能用于超几何检验以及GSEA
功能已满足用户自定义的注释。他们接受两个额外的参数TERM2GENE and TERM2NAME
。TERM2GENE是一个data.frame,第一列是term ID,二列是对应的基因;TERM2NAME 是一个data.frame,第一列是term ID,二列是对应的term name。
4.1 input
- for over representation analysis
gene ID的向量,可以有DESeq2等差异分析获得 - for GSEA
a ranked list of genes,包括3个特征:- numeric vector:fold changenumeric
- named vector: every number was named by the corresponding gene ID
- sorted vector: number should be sorted in decreasing order
创建该geneList方法如下, - 首先导入csv文件,第一列是不重复的gene_IDs,第二列是fold change。
- 按照如下方法创建自己的GeneList
d <- read.csv(your_csv_file)
## assume that 1st column is ID
## 2nd column is fold change
## feature 1: numeric vector
geneList <- d[,2]
## feature 2: named vector(vector命名)
names(geneList) <- as.character(d[,1])
## feature 3: decreasing order(vector降序)
geneList <- sort(geneList, decreasing = TRUE)
4.2 WikiPathways analysis
4.3 Cell marker
4.4 MsigDb analysis
可以下载GMT文件Broad Institute,然后用
read.gmt
解析文件用于enricher()
和GSEA()
。R包msigdbr已经整理打包了MsigDB基因集,可以直接被clusterProfile采用。
library(msigdbr)
msigdbr_show_species()
## [1] "Bos taurus" "Caenorhabditis elegans"
## [3] "Canis lupus familiaris" "Danio rerio"
## [5] "Drosophila melanogaster" "Gallus gallus"
## [7] "Homo sapiens" "Mus musculus"
## [9] "Rattus norvegicus" "Saccharomyces cerevisiae"
## [11] "Sus scrofa"
#retrieve all human gene sets
m_df <- msigdbr(species = "Homo sapiens")
head(m_df, 2) %>% as.data.frame
## gs_name gs_id gs_cat gs_subcat human_gene_symbol
## 1 AAACCAC_MIR140 M12609 C3 MIR ABCC4
## 2 AAACCAC_MIR140 M12609 C3 MIR ABRAXAS2
## species_name entrez_gene gene_symbol sources
## 1 Homo sapiens 10257 ABCC4 <NA>
## 2 Homo sapiens 23172 ABRAXAS2 <NA>
# retrieve specific collection. Here we use C6, oncogenic gene sets as an example
m_t2g <- msigdbr(species = "Homo sapiens", category = "C6") %>%
dplyr::select(gs_name, entrez_gene)
head(m_t2g)
## # A tibble: 6 x 2
## gs_name entrez_gene
## <chr> <int>
## 1 AKT_UP_MTOR_DN.V1_DN 25864
## 2 AKT_UP_MTOR_DN.V1_DN 95
## 3 AKT_UP_MTOR_DN.V1_DN 137872
## 4 AKT_UP_MTOR_DN.V1_DN 134
## 5 AKT_UP_MTOR_DN.V1_DN 55326
## 6 AKT_UP_MTOR_DN.V1_DN 216
em <- enricher(gene, TERM2GENE=m_t2g)
em2 <- GSEA(geneList, TERM2GENE = m_t2g)
head(em)
## ID
## RPS14_DN.V1_DN RPS14_DN.V1_DN
## GCNP_SHH_UP_LATE.V1_UP GCNP_SHH_UP_LATE.V1_UP
## VEGF_A_UP.V1_DN VEGF_A_UP.V1_DN
## PRC2_EZH2_UP.V1_DN PRC2_EZH2_UP.V1_DN
## CSR_LATE_UP.V1_UP CSR_LATE_UP.V1_UP
## ATF2_S_UP.V1_UP ATF2_S_UP.V1_UP
## Description GeneRatio
## RPS14_DN.V1_DN RPS14_DN.V1_DN 22/185
## GCNP_SHH_UP_LATE.V1_UP GCNP_SHH_UP_LATE.V1_UP 16/185
## VEGF_A_UP.V1_DN VEGF_A_UP.V1_DN 15/185
## PRC2_EZH2_UP.V1_DN PRC2_EZH2_UP.V1_DN 14/185
## CSR_LATE_UP.V1_UP CSR_LATE_UP.V1_UP 12/185
## ATF2_S_UP.V1_UP ATF2_S_UP.V1_UP 12/185
## BgRatio pvalue p.adjust
## RPS14_DN.V1_DN 185/10962 4.855041e-13 8.205020e-11
## GCNP_SHH_UP_LATE.V1_UP 186/10962 9.357887e-08 7.907414e-06
## VEGF_A_UP.V1_DN 193/10962 8.887046e-07 5.006369e-05
## PRC2_EZH2_UP.V1_DN 190/10962 3.907976e-06 1.651120e-04
## CSR_LATE_UP.V1_UP 166/10962 2.368079e-05 8.004105e-04
## ATF2_S_UP.V1_UP 190/10962 8.879028e-05 2.143651e-03
## qvalue
## RPS14_DN.V1_DN 6.694846e-11
## GCNP_SHH_UP_LATE.V1_UP 6.452017e-06
## VEGF_A_UP.V1_DN 4.084923e-05
## PRC2_EZH2_UP.V1_DN 1.347223e-04
## CSR_LATE_UP.V1_UP 6.530911e-04
## ATF2_S_UP.V1_UP 1.749102e-03
## geneID
## RPS14_DN.V1_DN 10874/55388/991/9493/1062/4605/9133/23397/79733/9787/55872/83461/54821/51659/9319/9055/10112/4174/5105/2532/7021/79901
## GCNP_SHH_UP_LATE.V1_UP 55388/7153/79733/6241/9787/51203/983/9212/1111/9319/9055/3833/6790/4174/3169/1580
## VEGF_A_UP.V1_DN 8318/9493/1062/9133/10403/6241/9787/4085/332/3832/7272/891/23362/2167/10234
## PRC2_EZH2_UP.V1_DN 8318/55388/4605/23397/9787/55355/10460/81620/2146/7272/9212/11182/3887/24137
## CSR_LATE_UP.V1_UP 55143/2305/4605/6241/11065/55872/983/332/2146/51659/9319/1580
## ATF2_S_UP.V1_UP 4312/6280/2305/9493/7153/2146/10112/3887/8544/22885/10234/10974
## Count
## RPS14_DN.V1_DN 22
## GCNP_SHH_UP_LATE.V1_UP 16
## VEGF_A_UP.V1_DN 15
## PRC2_EZH2_UP.V1_DN 14
## CSR_LATE_UP.V1_UP 12
## ATF2_S_UP.V1_UP 12
head(em2)
## ID Description setSize
## LEF1_UP.V1_DN LEF1_UP.V1_DN LEF1_UP.V1_DN 186
## LTE2_UP.V1_UP LTE2_UP.V1_UP LTE2_UP.V1_UP 183
## RAF_UP.V1_DN RAF_UP.V1_DN RAF_UP.V1_DN 189
## IL2_UP.V1_DN IL2_UP.V1_DN IL2_UP.V1_DN 182
## ATF2_S_UP.V1_DN ATF2_S_UP.V1_DN ATF2_S_UP.V1_DN 181
## ATF2_UP.V1_DN ATF2_UP.V1_DN ATF2_UP.V1_DN 177
## enrichmentScore NES pvalue
## LEF1_UP.V1_DN -0.4857056 -1.987140 0.001381215
## LTE2_UP.V1_UP -0.3938357 -1.608416 0.001383126
## RAF_UP.V1_DN -0.5521243 -2.254389 0.001383126
## IL2_UP.V1_DN -0.4252756 -1.732758 0.001388889
## ATF2_S_UP.V1_DN -0.4460160 -1.815767 0.001394700
## ATF2_UP.V1_DN -0.5008413 -2.032113 0.001402525
## p.adjust qvalues rank
## LEF1_UP.V1_DN 0.01354839 0.007394831 1705
## LTE2_UP.V1_UP 0.01354839 0.007394831 2110
## RAF_UP.V1_DN 0.01354839 0.007394831 2035
## IL2_UP.V1_DN 0.01354839 0.007394831 2774
## ATF2_S_UP.V1_DN 0.01354839 0.007394831 2531
## ATF2_UP.V1_DN 0.01354839 0.007394831 2055
## leading_edge
## LEF1_UP.V1_DN tags=35%, list=14%, signal=31%
## LTE2_UP.V1_UP tags=33%, list=17%, signal=28%
## RAF_UP.V1_DN tags=43%, list=16%, signal=37%
## IL2_UP.V1_DN tags=39%, list=22%, signal=31%
## ATF2_S_UP.V1_DN tags=39%, list=20%, signal=32%
## ATF2_UP.V1_DN tags=38%, list=16%, signal=32%
## core_enrichment
## LEF1_UP.V1_DN 79153/22998/10325/23221/79170/9187/4300/51479/25956/7552/8644/11162/7168/1490/54463/65084/25837/80303/1809/3397/4208/11223/79762/6451/481/54795/51626/8991/5950/5627/64699/6414/10103/221078/9961/5874/56898/90865/83989/5002/6653/56034/55314/85458/956/54502/4832/5783/1363/53832/54985/80221/214/50853/51149/57088/5157/2947/79932/7031/6505/9338/25924/80736/3169/10551
## LTE2_UP.V1_UP 23074/64759/1601/6558/7138/11138/10497/81031/665/2878/3554/4925/55005/9252/2034/5283/54869/5025/6019/65124/10979/51435/26999/4212/444/23588/55556/2954/93129/79762/54453/79789/316/79679/151011/51280/9630/57037/5627/9867/55198/6653/10455/5101/65055/8309/9429/23303/9590/253190/4982/4128/4680/1846/51760/80129/4857/5507/3158/5304/10974
## RAF_UP.V1_DN 23314/221037/1960/51340/55105/10265/22996/9687/5357/1952/12/4602/9231/596/51454/57419/595/4256/23022/8777/7837/7042/3397/57613/323/1831/6451/54843/7358/2353/8773/6938/8991/64699/57007/23389/10560/7227/3485/79068/10769/4254/26018/3480/2674/23327/857/116039/6542/2690/1955/1363/26353/7033/5376/89927/51363/8821/2239/6947/4886/214/3487/3667/5157/54847/54898/7031/6505/57535/10451/18/771/80129/5174/5507/56521/8839/8614/5241/10551/57758
## IL2_UP.V1_DN 51276/10858/5333/2977/79874/10014/7773/64798/26959/6588/579/777/5158/23361/9915/5087/64759/2535/6123/7010/51285/126353/80201/79841/1028/2322/4222/947/9187/225/9976/5592/54996/3202/55779/23148/5334/8605/57326/26115/54453/55766/64927/6591/55204/9940/57608/653483/9886/10279/55805/10628/901/9891/7102/27244/79921/4674/8292/5125/79843/6857/79940/23541/3357/50853/9863/79932/90362/55663/4250
## ATF2_S_UP.V1_DN 8714/53616/55667/51279/4642/7130/7139/2/710/80352/4608/11145/881/2252/3399/22846/10687/12/1397/1301/3554/2619/26011/78987/9284/1306/6586/9627/4054/54813/115207/55890/26249/4803/11075/3400/79789/8527/2353/10580/79987/80760/5493/90865/3485/3751/2202/2199/10610/1047/1294/23492/3908/5364/658/5350/2121/2331/4330/9737/4982/57088/8490/9358/6424/1307/3708/125/1311/56521/10351
## ATF2_UP.V1_DN 11145/9738/2252/8660/10687/5357/80201/12/22998/3554/2619/23556/8840/9284/57419/5737/6586/9627/8863/10085/115207/55890/23015/629/3212/3400/50650/8527/2353/79987/80760/9668/1012/10614/5493/8553/23126/90865/3485/3751/2202/57212/2199/10610/3075/6387/22834/1909/10129/57161/5350/2121/2331/4330/2952/4982/57088/9358/6424/6038/51313/1311/56521/10351/5304/4239/79901
5 Gene Ontology Analysis
GO分析groupGO()、richer GO()和gseGO()支持拥有可用OrgDb的物种。Bioconductor已经为大约20个物种提供了OrgDb,例如:人类org.Hs.eg.db,小鼠org.Mm.eg.db。
如果你有一个GO注释文数据(data.frame格式,第一列gene_ID,第二列GO ID),你可以用enricher()
和gseGO()
进行over-representation test and gene set enrichment analysis.
如果基因只是被直接注释,那么基因也应该同时被父系GO nodes注释(非直接的注释)。如果用户只有直接注释,可以将注释传递给buildGOmap
,它将推断出非直接注释并且生成适合enricher()
和gseGO()
的data.frame。
5.1 GO classification
groupGO
是基于特定level的GO分布的基因分类方法。举例如下:
library(clusterProfiler)
data(geneList, package="DOSE")
gene <- names(geneList)[abs(geneList) > 2]
gene.df <- bitr(gene, fromType = "ENTREZID",
toType = c("ENSEMBL", "SYMBOL"),
OrgDb = org.Hs.eg.db)
# bitr是ID转化工具
head(gene.df)
## ENTREZID ENSEMBL SYMBOL
## 1 4312 ENSG00000196611 MMP1
## 2 8318 ENSG00000093009 CDC45
## 3 10874 ENSG00000109255 NMU
## 4 55143 ENSG00000134690 CDCA8
## 5 55388 ENSG00000065328 MCM10
## 6 991 ENSG00000117399 CDC20
ggo <- groupGO(gene = gene,
OrgDb = org.Hs.eg.db,
ont = "CC",
level = 3,
readable = TRUE)
head(ggo)
## ID Description Count
## GO:0005886 GO:0005886 plasma membrane 55
## GO:0005628 GO:0005628 prospore membrane 0
## GO:0005789 GO:0005789 endoplasmic reticulum membrane 8
## GO:0019867 GO:0019867 outer membrane 3
## GO:0031090 GO:0031090 organelle membrane 16
## GO:0034357 GO:0034357 photosynthetic membrane 0
## GeneRatio
## GO:0005886 55/207
## GO:0005628 0/207
## GO:0005789 8/207
## GO:0019867 3/207
## GO:0031090 16/207
## GO:0034357 0/207
## geneID
## GO:0005886 S100A9/MELK/S100A8/MARCO/ASPM/CXCL10/LAMP3/CEP55/UGT8/UBE2C/SLC7A5/CXCL9/FADS2/MSLN/IL1R2/KIF18A/S100P/GZMB/TRAT1/GABRP/AQP9/GPR19/SLC2A6/KIF20A/LAG3/NUDT1/CACNA1D/VSTM4/ITPR1/SYT17/SLC16A4/CORIN/KCNK15/CA12/KCNE4/HLA-DQA1/ADH1B/PDZK1/C7/ACKR1/COL17A1/PSD3/EMCN/SLC44A4/LRP2/NLGN4X/MAPT/ERBB4/CX3CR1/LAMP5/ABCA8/STEAP4/PTPRT/TMC5/CYBRD1
## GO:0005628
## GO:0005789 FADS2/CDK1/CHODL/ITPR1/HLA-DQA1/CYP4F8/CYP4B1/FMO5
## GO:0019867 BCL2A1/MAOB/PGR
## GO:0031090 MARCO/BCL2A1/LAMP3/DUSP2/SLC2A6/DTL/NUDT1/MAOB/ITPR1/GASK1B/HLA-DQA1/LRP2/LAMP5/STEAP4/PGR/CYBRD1
input parameters of gene is a vector of gene IDs (can be any ID type that supported by corresponding OrgDb).
If readable is setting to TRUE, the input gene IDs will be converted to gene symbols.
5.2 GO over-representation test(差异表达基因的GO富集分析)
ego <- enrichGO(gene = gene,
universe = names(geneList),
OrgDb = org.Hs.eg.db,
ont = "CC",
pAdjustMethod = "BH",
pvalueCutoff = 0.01,
qvalueCutoff = 0.05,
readable = TRUE)
head(ego)
## ID
## GO:0005819 GO:0005819
## GO:0005876 GO:0005876
## GO:0000779 GO:0000779
## GO:0072686 GO:0072686
## GO:0000775 GO:0000775
## GO:0000776 GO:0000776
## Description
## GO:0005819 spindle
## GO:0005876 spindle microtubule
## GO:0000779 condensed chromosome, centromeric region
## GO:0072686 mitotic spindle
## GO:0000775 chromosome, centromeric region
## GO:0000776 kinetochore
## GeneRatio BgRatio pvalue p.adjust
## GO:0005819 25/200 272/11816 4.695505e-12 1.493171e-09
## GO:0005876 12/200 48/11816 1.623758e-11 2.581776e-09
## GO:0000779 15/200 91/11816 2.808998e-11 2.628919e-09
## GO:0072686 15/200 92/11816 3.306816e-11 2.628919e-09
## GO:0000775 18/200 154/11816 1.061302e-10 6.749884e-09
## GO:0000776 15/200 107/11816 3.047081e-10 1.614953e-08
## qvalue
## GO:0005819 1.378996e-09
## GO:0005876 2.384361e-09
## GO:0000779 2.427899e-09
## GO:0072686 2.427899e-09
## GO:0000775 6.233756e-09
## GO:0000776 1.491466e-08
## geneID
## GO:0005819 CDCA8/CDC20/KIF23/CENPE/ASPM/DLGAP5/SKA1/NUSAP1/TPX2/TACC3/NEK2/CDK1/MAD2L1/KIF18A/BIRC5/KIF11/TTK/AURKB/PRC1/KIFC1/KIF18B/KIF20A/AURKA/CCNB1/KIF4A
## GO:0005876 CENPE/SKA1/NUSAP1/CDK1/KIF18A/BIRC5/KIF11/AURKB/PRC1/KIF18B/AURKA/KIF4A
## GO:0000779 CENPE/NDC80/HJURP/SKA1/NEK2/CENPM/CENPN/ERCC6L/MAD2L1/CDT1/BIRC5/NCAPG/AURKB/AURKA/CCNB1
## GO:0072686 KIF23/CENPE/ASPM/SKA1/NUSAP1/TPX2/TACC3/CDK1/MAD2L1/KIF18A/KIF11/AURKB/KIFC1/KIF18B/AURKA
## GO:0000775 CDCA8/CENPE/NDC80/HJURP/SKA1/NEK2/CENPM/CENPN/ERCC6L/MAD2L1/KIF18A/CDT1/BIRC5/TTK/NCAPG/AURKB/AURKA/CCNB1
## GO:0000776 CENPE/NDC80/HJURP/SKA1/NEK2/CENPM/CENPN/ERCC6L/MAD2L1/KIF18A/CDT1/BIRC5/TTK/AURKB/CCNB1
## Count
## GO:0005819 25
## GO:0005876 12
## GO:0000779 15
## GO:0072686 15
## GO:0000775 18
## GO:0000776 15
这里gene ID只要是OrgDb支持的都可以
User need to specify thekeyType
parameter to specify the input gene ID type.
ego2 <- enrichGO(gene = gene.df$ENSEMBL,
OrgDb = org.Hs.eg.db,
keyType = 'ENSEMBL',
ont = "CC",
pAdjustMethod = "BH",
pvalueCutoff = 0.01,
qvalueCutoff = 0.05)
Gene ID can be mapped to gene Symbol by using paramter readable=TRUE
or setReadable
function.
ego2 <- setReadable(ego2, OrgDb = org.Hs.eg.db)
5.2.1 drop specific GO terms or level(dropGO
)
5.2.2 test GO at sepcific level(gofilter
)
5.2.3 reduce redundancy of enriched GO terms(simplify
)
5.3 GSEA
ego3 <- gseGO(geneList = geneList,
OrgDb = org.Hs.eg.db,
ont = "CC",
nPerm = 1000,
minGSSize = 100,
maxGSSize = 500,
pvalueCutoff = 0.05,
verbose = FALSE)
6 KEGG analysis
6.1 KEGG over-representation test
data(geneList, package="DOSE")
gene <- names(geneList)[abs(geneList) > 2]
kk <- enrichKEGG(gene = gene,
organism = 'hsa',
pvalueCutoff = 0.05)
head(kk)
Input ID type can be
kegg
,ncbi-geneid
,ncbi-proteinid
oruniprot
, an example can be found in the post.
6.2 KEGG Gene Set Enrichment Analysis
kk2 <- gseKEGG(geneList = geneList,
organism = 'hsa',
nPerm = 1000,
minGSSize = 120,
pvalueCutoff = 0.05,
verbose = FALSE)
head(kk2)
6.3 KEGG Module over-representation test
KEGG Module is a collection of manually defined function units. In some situation, KEGG Modules have a more straightforward interpretation.
mkk <- enrichMKEGG(gene = gene,
organism = 'hsa')
6.4 KEGG Module Gene Set Enrichment Analysis
mkk2 <- gseMKEGG(geneList = geneList,
organism = 'hsa')
7 MsigDb analysis
8 Reactome pathway analysis
9 Mesh enrichment analysis
10 Functional enrichment analysis of genomic coordinations
11. Biological theme comparison
12 功能富集分析结果可视化
支持DOSE
,clusterProfiler
,ReactomePA
,meshes
结果的可视化;支持over representation analysis (ORA) 和gene set enrichment analysis (GSEA)
多数情况下可视化过程先有DOSE
实施,然后传递给ggplot2
绘图
12.1 Bar plot
最为常见,通常以bar的颜色和高度体现enrichment scores(比如pvalue),gene counts或者ratio作为bar
library(DOSE)
data(geneList)
de <- names(geneList)[abs(geneList) > 2]
edo <- enrichDGN(de)
library(enrichplot)
barplot(edo, showCategory=20)
12.2 Dot plot
edo2 <- gseNCG(geneList, nPerm=10000)
p1 <- dotplot(edo, showCategory=30) + ggtitle("dotplot for ORA")
p2 <- dotplot(edo2, showCategory=30) + ggtitle("dotplot for GSEA")
library(cowplot)
plot_grid(p1, p2, ncol=2)
结果展示:
在案例中,gseNCG未富集到差异的genset,此外,目前gseNCG不推荐使用nPerm参数。
12.3 Gene-Concept Network
barplot
和dotplot
只显示最显著富集得到terms,如果想知道在显著富集的terms中有哪些基因?cnetplot
描绘了基因和生物学概念(GO,KEGG等)的关联。GSEA 结果同样可以,但是只是显示核心富集的基因
## convert gene ID to Symbol
edox <- setReadable(edo, 'org.Hs.eg.db', 'ENTREZID')
p1 <- cnetplot(edox, foldChange=geneList)
install.packages("ggnewscale")
library(ggnewscale)
## categorySize can be scaled by 'pvalue' or 'geneNum'
p2 <- cnetplot(edox, categorySize="pvalue", foldChange=geneList)
p3 <- cnetplot(edox, foldChange=geneList, circular = TRUE, colorEdge = TRUE)
cowplot::plot_grid(p1, p2, p3, ncol=3, labels=LETTERS[1:3], rel_widths=c(.8, .8, 1.2))
合并输出图片显示不美观,可以单独输出
node_label
可以设置label的类型
p1 <- cnetplot(edox, node_label="category")
p2 <- cnetplot(edox, node_label="gene")
p3 <- cnetplot(edox, node_label="all")
p4 <- cnetplot(edox, node_label="none")
cowplot::plot_grid(p1, p2, p3, p4, ncol=2, labels=LETTERS[1:4])
gene category (A), gene name (B), both gene category and gene name (C, default) and not to label at all (D).
12.4 Heatmap-like functional classification
heatplot
类似于cneplot,适合terms数目较多时使用
p1 <- heatplot(edox)
p2 <- heatplot(edox, foldChange=geneList)
cowplot::plot_grid(p1, p2, ncol=1, labels=LETTERS[1:2])
Heatmap plot of enriched terms. default (A), foldChange=geneList (B)
12.5 enrichment MAP
Enrichment map organizes enriched terms into a network with edges connecting overlapping gene sets. In this way, mutually overlapping gene sets are tend to cluster together, making it easy to identify functional module.
emapplot
支持超几何检验和GSEA分析的结果。pie_scale
设置node大小;
layout
设布局。
p1 <- emapplot(edo)
p2 <- emapplot(edo, pie_scale=1.5)
p3 <- emapplot(edo,layout="kk")
p4 <- emapplot(edo, pie_scale=1.5,layout="kk")
cowplot::plot_grid(p1, p2, p3, p4, ncol=2, labels=LETTERS[1:4])
Plot for results obtained from hypergeometric test and gene set enrichment analysis. default (A), pie_scale=1.5 (B), layout="kk" (C) and pie_scale=1.5,layout="kk" (D).
12.6 UnSet plot
可视化基因和基因集的关系;强调不同基因集中基因的重叠
对于over-representation analysis(ORA)分析,upsetplot
计算不同基因集合的重叠,并如下展示
install.packages("ggupset")
library(ggupset)
upsetplot(edo)
对于GSEA,将绘制不同分类中flod change的分布(例如,通路特异的,不同通路见重叠的)
upsetplot(kk2) # kk2不知道从哪里来的
12.7 GSEA结果表达分布的脊线图
ridgeplot
对GSEA富集的种类中的核心富集基因的分布进性可视化。帮助解读上调或者下调的基因
install.packages("ggridges")
ridgeplot(edo2)
12.8 running score and preranked list of GSEA result
gseaplot2
gseaplot2(edo2, geneSetID = 1, title = edo2$Description[1])
gseaplot2
also supports multile gene sets to be displayed on the same figure
gseaplot2(edo2, geneSetID = 1:3)
还可以通过pvalue_table参数显示pvalue table
gseaplot2(edo2, geneSetID = 1:3, pvalue_table = TRUE,
color = c("#E495A5", "#86B875", "#7DB0DD"), ES_geom = "dot")
12.9 富集的pathway可视化
pathview
用于KEGG pathway 的可视化
library("pathview") ##pathway未安装成功
hsa04110 <- pathview(gene.data = geneList,
pathway.id = "hsa04110",
species = "hsa",
limit = list(gene=max(abs(geneList)), cpd=1))
13 其他有用的工具
13.1 bitr
: Biological Id TranslatoR
eg = bitr(x, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")