2021-03-07 clusterProfiler 学习记录

clusterProfiler: universal enrichment tool for functional and comparative study(Y叔)

1 总览

图片.png

2 介绍

2.1 术语

2.1.1 gene sets and pathway

  • gene set:无序但功能相关
  • pathway:有序且功能相关,不考虑上下游关系可认为是一个gene set

2.1.2 Gene Ontology (GO)

GO定义基因功能的概念和类别以及 概念间的相互关系,包括三种

  • molecular function(MF):基因产物的分子功能
  • cellular component(CC)
  • biological process(BP):多种基因产物参与的通路或者复杂过程
    GO术语被组织在一个有向无环图中,术语之间的边表示父子关系。

3 功能富集分析方法

3.1 Over Representation Analysis

Over Representation Analysis (ORA) (Boyle et al. 2004) is a widely used approach to determine whether known biological functions or processes are over-represented (= enriched) in an experimentally-derived gene list, e.g. a list of differentially expressed genes (DEGs).

3.2 Gene Set Enrichment Analysis(GSEA)

不人为设定阈值,利用到每个基因的数据,以基因集形式比较在gene sets中的分布,以发现比较小但是较为一致的改变。
包含三种主要元素

  • Calculation of an Enrichment Score.
    ER分数代表多大程度上分析的基因集合S在ranked list L的顶部或者底部富集。在遍历集合listL中,如果发现了存在于S的基因increasing a running-sum statistic ,不在于S中,则下降。
  • Esimation of Significance Level of ES. (p value)
  • Adjustment for Multiple Hypothesis Testing. (q value)

GSEA算法有两种,“DOSE”以及“FGSEA”。DOSE中有by=参数设置,默认为“DOSE”。

3.3 Leading edge analysis and core enriched genes

Leading edge analysis reports Tags to indicate the percentage of genes contributing to the enrichment score, List to indicate where in the list the enrichment score is attained and Signal for enrichment signal strength.
类似于分析hub gene?

4 通用的富集分析

clusterProfiler支持许多oncology/通路的超几何检验以及基因集富集分析,此外,其还提供enricher功能用于超几何检验以及GSEA功能已满足用户自定义的注释。他们接受两个额外的参数TERM2GENE and TERM2NAME。TERM2GENE是一个data.frame,第一列是term ID,二列是对应的基因;TERM2NAME 是一个data.frame,第一列是term ID,二列是对应的term name。

4.1 input

  • for over representation analysis
    gene ID的向量,可以有DESeq2等差异分析获得
  • for GSEA
    a ranked list of genes,包括3个特征:
    • numeric vector:fold changenumeric
    • named vector: every number was named by the corresponding gene ID
    • sorted vector: number should be sorted in decreasing order
      创建该geneList方法如下,
    • 首先导入csv文件,第一列是不重复的gene_IDs,第二列是fold change。
    • 按照如下方法创建自己的GeneList
d <- read.csv(your_csv_file)
## assume that 1st column is ID
## 2nd column is fold change

## feature 1: numeric vector
geneList <- d[,2]  

## feature 2: named vector(vector命名)
names(geneList) <- as.character(d[,1])

## feature 3: decreasing order(vector降序)
geneList <- sort(geneList, decreasing = TRUE)

4.2 WikiPathways analysis

4.3 Cell marker

4.4 MsigDb analysis

图片.png

可以下载GMT文件Broad Institute,然后用read.gmt解析文件用于enricher()GSEA()
R包msigdbr已经整理打包了MsigDB基因集,可以直接被clusterProfile采用。

library(msigdbr)
msigdbr_show_species()
##  [1] "Bos taurus"               "Caenorhabditis elegans"  
##  [3] "Canis lupus familiaris"   "Danio rerio"             
##  [5] "Drosophila melanogaster"  "Gallus gallus"           
##  [7] "Homo sapiens"             "Mus musculus"            
##  [9] "Rattus norvegicus"        "Saccharomyces cerevisiae"
## [11] "Sus scrofa"
#retrieve all human gene sets
m_df <- msigdbr(species = "Homo sapiens")
head(m_df, 2) %>% as.data.frame
##          gs_name  gs_id gs_cat gs_subcat human_gene_symbol
## 1 AAACCAC_MIR140 M12609     C3       MIR             ABCC4
## 2 AAACCAC_MIR140 M12609     C3       MIR          ABRAXAS2
##   species_name entrez_gene gene_symbol sources
## 1 Homo sapiens       10257       ABCC4    <NA>
## 2 Homo sapiens       23172    ABRAXAS2    <NA>
# retrieve specific collection. Here we use C6, oncogenic gene sets as an example
m_t2g <- msigdbr(species = "Homo sapiens", category = "C6") %>% 
  dplyr::select(gs_name, entrez_gene)
head(m_t2g)
## # A tibble: 6 x 2
##   gs_name              entrez_gene
##   <chr>                      <int>
## 1 AKT_UP_MTOR_DN.V1_DN       25864
## 2 AKT_UP_MTOR_DN.V1_DN          95
## 3 AKT_UP_MTOR_DN.V1_DN      137872
## 4 AKT_UP_MTOR_DN.V1_DN         134
## 5 AKT_UP_MTOR_DN.V1_DN       55326
## 6 AKT_UP_MTOR_DN.V1_DN         216
em <- enricher(gene, TERM2GENE=m_t2g)
em2 <- GSEA(geneList, TERM2GENE = m_t2g)
head(em)

##                                            ID
## RPS14_DN.V1_DN                 RPS14_DN.V1_DN
## GCNP_SHH_UP_LATE.V1_UP GCNP_SHH_UP_LATE.V1_UP
## VEGF_A_UP.V1_DN               VEGF_A_UP.V1_DN
## PRC2_EZH2_UP.V1_DN         PRC2_EZH2_UP.V1_DN
## CSR_LATE_UP.V1_UP           CSR_LATE_UP.V1_UP
## ATF2_S_UP.V1_UP               ATF2_S_UP.V1_UP
##                                   Description GeneRatio
## RPS14_DN.V1_DN                 RPS14_DN.V1_DN    22/185
## GCNP_SHH_UP_LATE.V1_UP GCNP_SHH_UP_LATE.V1_UP    16/185
## VEGF_A_UP.V1_DN               VEGF_A_UP.V1_DN    15/185
## PRC2_EZH2_UP.V1_DN         PRC2_EZH2_UP.V1_DN    14/185
## CSR_LATE_UP.V1_UP           CSR_LATE_UP.V1_UP    12/185
## ATF2_S_UP.V1_UP               ATF2_S_UP.V1_UP    12/185
##                          BgRatio       pvalue     p.adjust
## RPS14_DN.V1_DN         185/10962 4.855041e-13 8.205020e-11
## GCNP_SHH_UP_LATE.V1_UP 186/10962 9.357887e-08 7.907414e-06
## VEGF_A_UP.V1_DN        193/10962 8.887046e-07 5.006369e-05
## PRC2_EZH2_UP.V1_DN     190/10962 3.907976e-06 1.651120e-04
## CSR_LATE_UP.V1_UP      166/10962 2.368079e-05 8.004105e-04
## ATF2_S_UP.V1_UP        190/10962 8.879028e-05 2.143651e-03
##                              qvalue
## RPS14_DN.V1_DN         6.694846e-11
## GCNP_SHH_UP_LATE.V1_UP 6.452017e-06
## VEGF_A_UP.V1_DN        4.084923e-05
## PRC2_EZH2_UP.V1_DN     1.347223e-04
## CSR_LATE_UP.V1_UP      6.530911e-04
## ATF2_S_UP.V1_UP        1.749102e-03
##                                                                                                                                        geneID
## RPS14_DN.V1_DN         10874/55388/991/9493/1062/4605/9133/23397/79733/9787/55872/83461/54821/51659/9319/9055/10112/4174/5105/2532/7021/79901
## GCNP_SHH_UP_LATE.V1_UP                                      55388/7153/79733/6241/9787/51203/983/9212/1111/9319/9055/3833/6790/4174/3169/1580
## VEGF_A_UP.V1_DN                                                   8318/9493/1062/9133/10403/6241/9787/4085/332/3832/7272/891/23362/2167/10234
## PRC2_EZH2_UP.V1_DN                                               8318/55388/4605/23397/9787/55355/10460/81620/2146/7272/9212/11182/3887/24137
## CSR_LATE_UP.V1_UP                                                               55143/2305/4605/6241/11065/55872/983/332/2146/51659/9319/1580
## ATF2_S_UP.V1_UP                                                               4312/6280/2305/9493/7153/2146/10112/3887/8544/22885/10234/10974
##                        Count
## RPS14_DN.V1_DN            22
## GCNP_SHH_UP_LATE.V1_UP    16
## VEGF_A_UP.V1_DN           15
## PRC2_EZH2_UP.V1_DN        14
## CSR_LATE_UP.V1_UP         12
## ATF2_S_UP.V1_UP           12
head(em2)

##                              ID     Description setSize
## LEF1_UP.V1_DN     LEF1_UP.V1_DN   LEF1_UP.V1_DN     186
## LTE2_UP.V1_UP     LTE2_UP.V1_UP   LTE2_UP.V1_UP     183
## RAF_UP.V1_DN       RAF_UP.V1_DN    RAF_UP.V1_DN     189
## IL2_UP.V1_DN       IL2_UP.V1_DN    IL2_UP.V1_DN     182
## ATF2_S_UP.V1_DN ATF2_S_UP.V1_DN ATF2_S_UP.V1_DN     181
## ATF2_UP.V1_DN     ATF2_UP.V1_DN   ATF2_UP.V1_DN     177
##                 enrichmentScore       NES      pvalue
## LEF1_UP.V1_DN        -0.4857056 -1.987140 0.001381215
## LTE2_UP.V1_UP        -0.3938357 -1.608416 0.001383126
## RAF_UP.V1_DN         -0.5521243 -2.254389 0.001383126
## IL2_UP.V1_DN         -0.4252756 -1.732758 0.001388889
## ATF2_S_UP.V1_DN      -0.4460160 -1.815767 0.001394700
## ATF2_UP.V1_DN        -0.5008413 -2.032113 0.001402525
##                   p.adjust     qvalues rank
## LEF1_UP.V1_DN   0.01354839 0.007394831 1705
## LTE2_UP.V1_UP   0.01354839 0.007394831 2110
## RAF_UP.V1_DN    0.01354839 0.007394831 2035
## IL2_UP.V1_DN    0.01354839 0.007394831 2774
## ATF2_S_UP.V1_DN 0.01354839 0.007394831 2531
## ATF2_UP.V1_DN   0.01354839 0.007394831 2055
##                                   leading_edge
## LEF1_UP.V1_DN   tags=35%, list=14%, signal=31%
## LTE2_UP.V1_UP   tags=33%, list=17%, signal=28%
## RAF_UP.V1_DN    tags=43%, list=16%, signal=37%
## IL2_UP.V1_DN    tags=39%, list=22%, signal=31%
## ATF2_S_UP.V1_DN tags=39%, list=20%, signal=32%
## ATF2_UP.V1_DN   tags=38%, list=16%, signal=32%
##                                                                                                                                                                                                                                                                                                                                                                                                                                                  core_enrichment
## LEF1_UP.V1_DN                                                                        79153/22998/10325/23221/79170/9187/4300/51479/25956/7552/8644/11162/7168/1490/54463/65084/25837/80303/1809/3397/4208/11223/79762/6451/481/54795/51626/8991/5950/5627/64699/6414/10103/221078/9961/5874/56898/90865/83989/5002/6653/56034/55314/85458/956/54502/4832/5783/1363/53832/54985/80221/214/50853/51149/57088/5157/2947/79932/7031/6505/9338/25924/80736/3169/10551
## LTE2_UP.V1_UP                                                                                                       23074/64759/1601/6558/7138/11138/10497/81031/665/2878/3554/4925/55005/9252/2034/5283/54869/5025/6019/65124/10979/51435/26999/4212/444/23588/55556/2954/93129/79762/54453/79789/316/79679/151011/51280/9630/57037/5627/9867/55198/6653/10455/5101/65055/8309/9429/23303/9590/253190/4982/4128/4680/1846/51760/80129/4857/5507/3158/5304/10974
## RAF_UP.V1_DN    23314/221037/1960/51340/55105/10265/22996/9687/5357/1952/12/4602/9231/596/51454/57419/595/4256/23022/8777/7837/7042/3397/57613/323/1831/6451/54843/7358/2353/8773/6938/8991/64699/57007/23389/10560/7227/3485/79068/10769/4254/26018/3480/2674/23327/857/116039/6542/2690/1955/1363/26353/7033/5376/89927/51363/8821/2239/6947/4886/214/3487/3667/5157/54847/54898/7031/6505/57535/10451/18/771/80129/5174/5507/56521/8839/8614/5241/10551/57758
## IL2_UP.V1_DN                                                  51276/10858/5333/2977/79874/10014/7773/64798/26959/6588/579/777/5158/23361/9915/5087/64759/2535/6123/7010/51285/126353/80201/79841/1028/2322/4222/947/9187/225/9976/5592/54996/3202/55779/23148/5334/8605/57326/26115/54453/55766/64927/6591/55204/9940/57608/653483/9886/10279/55805/10628/901/9891/7102/27244/79921/4674/8292/5125/79843/6857/79940/23541/3357/50853/9863/79932/90362/55663/4250
## ATF2_S_UP.V1_DN                                                               8714/53616/55667/51279/4642/7130/7139/2/710/80352/4608/11145/881/2252/3399/22846/10687/12/1397/1301/3554/2619/26011/78987/9284/1306/6586/9627/4054/54813/115207/55890/26249/4803/11075/3400/79789/8527/2353/10580/79987/80760/5493/90865/3485/3751/2202/2199/10610/1047/1294/23492/3908/5364/658/5350/2121/2331/4330/9737/4982/57088/8490/9358/6424/1307/3708/125/1311/56521/10351
## ATF2_UP.V1_DN                                                                             11145/9738/2252/8660/10687/5357/80201/12/22998/3554/2619/23556/8840/9284/57419/5737/6586/9627/8863/10085/115207/55890/23015/629/3212/3400/50650/8527/2353/79987/80760/9668/1012/10614/5493/8553/23126/90865/3485/3751/2202/57212/2199/10610/3075/6387/22834/1909/10129/57161/5350/2121/2331/4330/2952/4982/57088/9358/6424/6038/51313/1311/56521/10351/5304/4239/79901

5 Gene Ontology Analysis

GO分析groupGO()、richer GO()和gseGO()支持拥有可用OrgDb的物种。Bioconductor已经为大约20个物种提供了OrgDb,例如:人类org.Hs.eg.db,小鼠org.Mm.eg.db。
如果你有一个GO注释文数据(data.frame格式,第一列gene_ID,第二列GO ID),你可以用enricher()gseGO()进行over-representation test and gene set enrichment analysis.
如果基因只是被直接注释,那么基因也应该同时被父系GO nodes注释(非直接的注释)。如果用户只有直接注释,可以将注释传递给buildGOmap,它将推断出非直接注释并且生成适合enricher()gseGO()data.frame

5.1 GO classification

groupGO是基于特定level的GO分布的基因分类方法。举例如下:

library(clusterProfiler)
data(geneList, package="DOSE")
gene <- names(geneList)[abs(geneList) > 2]
gene.df <- bitr(gene, fromType = "ENTREZID",
        toType = c("ENSEMBL", "SYMBOL"),
        OrgDb = org.Hs.eg.db) 
# bitr是ID转化工具
head(gene.df)
##   ENTREZID         ENSEMBL SYMBOL
## 1     4312 ENSG00000196611   MMP1
## 2     8318 ENSG00000093009  CDC45
## 3    10874 ENSG00000109255    NMU
## 4    55143 ENSG00000134690  CDCA8
## 5    55388 ENSG00000065328  MCM10
## 6      991 ENSG00000117399  CDC20
ggo <- groupGO(gene     = gene,
               OrgDb    = org.Hs.eg.db,
               ont      = "CC",
               level    = 3,
               readable = TRUE)

head(ggo)
##                    ID                    Description Count
## GO:0005886 GO:0005886                plasma membrane    55
## GO:0005628 GO:0005628              prospore membrane     0
## GO:0005789 GO:0005789 endoplasmic reticulum membrane     8
## GO:0019867 GO:0019867                 outer membrane     3
## GO:0031090 GO:0031090             organelle membrane    16
## GO:0034357 GO:0034357        photosynthetic membrane     0
##            GeneRatio
## GO:0005886    55/207
## GO:0005628     0/207
## GO:0005789     8/207
## GO:0019867     3/207
## GO:0031090    16/207
## GO:0034357     0/207
##                                                                                                                                                                                                                                                                                                                                                      geneID
## GO:0005886 S100A9/MELK/S100A8/MARCO/ASPM/CXCL10/LAMP3/CEP55/UGT8/UBE2C/SLC7A5/CXCL9/FADS2/MSLN/IL1R2/KIF18A/S100P/GZMB/TRAT1/GABRP/AQP9/GPR19/SLC2A6/KIF20A/LAG3/NUDT1/CACNA1D/VSTM4/ITPR1/SYT17/SLC16A4/CORIN/KCNK15/CA12/KCNE4/HLA-DQA1/ADH1B/PDZK1/C7/ACKR1/COL17A1/PSD3/EMCN/SLC44A4/LRP2/NLGN4X/MAPT/ERBB4/CX3CR1/LAMP5/ABCA8/STEAP4/PTPRT/TMC5/CYBRD1
## GO:0005628                                                                                                                                                                                                                                                                                                                                                 
## GO:0005789                                                                                                                                                                                                                                                                                               FADS2/CDK1/CHODL/ITPR1/HLA-DQA1/CYP4F8/CYP4B1/FMO5
## GO:0019867                                                                                                                                                                                                                                                                                                                                  BCL2A1/MAOB/PGR
## GO:0031090                                                                                                                                                                                                                                                MARCO/BCL2A1/LAMP3/DUSP2/SLC2A6/DTL/NUDT1/MAOB/ITPR1/GASK1B/HLA-DQA1/LRP2/LAMP5/STEAP4/PGR/CYBRD1

input parameters of gene is a vector of gene IDs (can be any ID type that supported by corresponding OrgDb).
If readable is setting to TRUE, the input gene IDs will be converted to gene symbols.

5.2 GO over-representation test(差异表达基因的GO富集分析)

ego <- enrichGO(gene          = gene,
                universe      = names(geneList),
                OrgDb         = org.Hs.eg.db,
                ont           = "CC",
                pAdjustMethod = "BH",
                pvalueCutoff  = 0.01,
                qvalueCutoff  = 0.05,
        readable      = TRUE)
head(ego)
##                    ID
## GO:0005819 GO:0005819
## GO:0005876 GO:0005876
## GO:0000779 GO:0000779
## GO:0072686 GO:0072686
## GO:0000775 GO:0000775
## GO:0000776 GO:0000776
##                                         Description
## GO:0005819                                  spindle
## GO:0005876                      spindle microtubule
## GO:0000779 condensed chromosome, centromeric region
## GO:0072686                          mitotic spindle
## GO:0000775           chromosome, centromeric region
## GO:0000776                              kinetochore
##            GeneRatio   BgRatio       pvalue     p.adjust
## GO:0005819    25/200 272/11816 4.695505e-12 1.493171e-09
## GO:0005876    12/200  48/11816 1.623758e-11 2.581776e-09
## GO:0000779    15/200  91/11816 2.808998e-11 2.628919e-09
## GO:0072686    15/200  92/11816 3.306816e-11 2.628919e-09
## GO:0000775    18/200 154/11816 1.061302e-10 6.749884e-09
## GO:0000776    15/200 107/11816 3.047081e-10 1.614953e-08
##                  qvalue
## GO:0005819 1.378996e-09
## GO:0005876 2.384361e-09
## GO:0000779 2.427899e-09
## GO:0072686 2.427899e-09
## GO:0000775 6.233756e-09
## GO:0000776 1.491466e-08
##                                                                                                                                                         geneID
## GO:0005819 CDCA8/CDC20/KIF23/CENPE/ASPM/DLGAP5/SKA1/NUSAP1/TPX2/TACC3/NEK2/CDK1/MAD2L1/KIF18A/BIRC5/KIF11/TTK/AURKB/PRC1/KIFC1/KIF18B/KIF20A/AURKA/CCNB1/KIF4A
## GO:0005876                                                                             CENPE/SKA1/NUSAP1/CDK1/KIF18A/BIRC5/KIF11/AURKB/PRC1/KIF18B/AURKA/KIF4A
## GO:0000779                                                            CENPE/NDC80/HJURP/SKA1/NEK2/CENPM/CENPN/ERCC6L/MAD2L1/CDT1/BIRC5/NCAPG/AURKB/AURKA/CCNB1
## GO:0072686                                                           KIF23/CENPE/ASPM/SKA1/NUSAP1/TPX2/TACC3/CDK1/MAD2L1/KIF18A/KIF11/AURKB/KIFC1/KIF18B/AURKA
## GO:0000775                                           CDCA8/CENPE/NDC80/HJURP/SKA1/NEK2/CENPM/CENPN/ERCC6L/MAD2L1/KIF18A/CDT1/BIRC5/TTK/NCAPG/AURKB/AURKA/CCNB1
## GO:0000776                                                             CENPE/NDC80/HJURP/SKA1/NEK2/CENPM/CENPN/ERCC6L/MAD2L1/KIF18A/CDT1/BIRC5/TTK/AURKB/CCNB1
##            Count
## GO:0005819    25
## GO:0005876    12
## GO:0000779    15
## GO:0072686    15
## GO:0000775    18
## GO:0000776    15

这里gene ID只要是OrgDb支持的都可以
User need to specify thekeyType parameter to specify the input gene ID type.

ego2 <- enrichGO(gene         = gene.df$ENSEMBL,
                OrgDb         = org.Hs.eg.db,
                keyType       = 'ENSEMBL',
                ont           = "CC",
                pAdjustMethod = "BH",
                pvalueCutoff  = 0.01,
                qvalueCutoff  = 0.05)

Gene ID can be mapped to gene Symbol by using paramter readable=TRUE or setReadable function.

ego2 <- setReadable(ego2, OrgDb = org.Hs.eg.db)

5.2.1 drop specific GO terms or level(dropGO)

5.2.2 test GO at sepcific level(gofilter)

5.2.3 reduce redundancy of enriched GO terms(simplify)

5.3 GSEA

ego3 <- gseGO(geneList     = geneList,
              OrgDb        = org.Hs.eg.db,
              ont          = "CC",
              nPerm        = 1000,
              minGSSize    = 100,
              maxGSSize    = 500,
              pvalueCutoff = 0.05,
              verbose      = FALSE)

6 KEGG analysis

6.1 KEGG over-representation test

data(geneList, package="DOSE")
gene <- names(geneList)[abs(geneList) > 2]

kk <- enrichKEGG(gene         = gene,
                 organism     = 'hsa',
                 pvalueCutoff = 0.05)
head(kk)

Input ID type can be kegg, ncbi-geneid, ncbi-proteinid or uniprot, an example can be found in the post.

6.2 KEGG Gene Set Enrichment Analysis

kk2 <- gseKEGG(geneList     = geneList,
               organism     = 'hsa',
               nPerm        = 1000,
               minGSSize    = 120,
               pvalueCutoff = 0.05,
               verbose      = FALSE)
head(kk2)

6.3 KEGG Module over-representation test

KEGG Module is a collection of manually defined function units. In some situation, KEGG Modules have a more straightforward interpretation.

mkk <- enrichMKEGG(gene = gene,
                   organism = 'hsa')

6.4 KEGG Module Gene Set Enrichment Analysis

mkk2 <- gseMKEGG(geneList = geneList,
                 organism = 'hsa')

7 MsigDb analysis

8 Reactome pathway analysis

9 Mesh enrichment analysis

10 Functional enrichment analysis of genomic coordinations

11. Biological theme comparison

12 功能富集分析结果可视化

支持DOSE,clusterProfiler,ReactomePA,meshes结果的可视化;支持over representation analysis (ORA) 和gene set enrichment analysis (GSEA)
多数情况下可视化过程先有DOSE实施,然后传递给ggplot2绘图

12.1 Bar plot

最为常见,通常以bar的颜色和高度体现enrichment scores(比如pvalue),gene counts或者ratio作为bar

library(DOSE)
data(geneList)
de <- names(geneList)[abs(geneList) > 2]

edo <- enrichDGN(de)
library(enrichplot)
barplot(edo, showCategory=20)
图片.png

12.2 Dot plot

edo2 <- gseNCG(geneList, nPerm=10000)
p1 <- dotplot(edo, showCategory=30) + ggtitle("dotplot for ORA")
p2 <- dotplot(edo2, showCategory=30) + ggtitle("dotplot for GSEA")
library(cowplot)
plot_grid(p1, p2, ncol=2)

结果展示:


图片.png

在案例中,gseNCG未富集到差异的genset,此外,目前gseNCG不推荐使用nPerm参数。

12.3 Gene-Concept Network

barplotdotplot只显示最显著富集得到terms,如果想知道在显著富集的terms中有哪些基因?cnetplot描绘了基因和生物学概念(GO,KEGG等)的关联。GSEA 结果同样可以,但是只是显示核心富集的基因

## convert gene ID to Symbol
edox <- setReadable(edo, 'org.Hs.eg.db', 'ENTREZID')
p1 <- cnetplot(edox, foldChange=geneList)
install.packages("ggnewscale")
library(ggnewscale)
## categorySize can be scaled by 'pvalue' or 'geneNum'
p2 <- cnetplot(edox, categorySize="pvalue", foldChange=geneList)
p3 <- cnetplot(edox, foldChange=geneList, circular = TRUE, colorEdge = TRUE)
cowplot::plot_grid(p1, p2, p3, ncol=3, labels=LETTERS[1:3], rel_widths=c(.8, .8, 1.2))

图片.png

合并输出图片显示不美观,可以单独输出
node_label可以设置label的类型

p1 <- cnetplot(edox, node_label="category") 
p2 <- cnetplot(edox, node_label="gene") 
p3 <- cnetplot(edox, node_label="all") 
p4 <- cnetplot(edox, node_label="none") 
cowplot::plot_grid(p1, p2, p3, p4, ncol=2, labels=LETTERS[1:4])
图片.png

gene category (A), gene name (B), both gene category and gene name (C, default) and not to label at all (D).

12.4 Heatmap-like functional classification

heatplot类似于cneplot,适合terms数目较多时使用

p1 <- heatplot(edox)
p2 <- heatplot(edox, foldChange=geneList)
cowplot::plot_grid(p1, p2, ncol=1, labels=LETTERS[1:2])

图片.png

Heatmap plot of enriched terms. default (A), foldChange=geneList (B)

12.5 enrichment MAP

Enrichment map organizes enriched terms into a network with edges connecting overlapping gene sets. In this way, mutually overlapping gene sets are tend to cluster together, making it easy to identify functional module.
emapplot支持超几何检验和GSEA分析的结果。pie_scale 设置node大小;
layout 设布局。

p1 <- emapplot(edo)
p2 <- emapplot(edo, pie_scale=1.5)
p3 <- emapplot(edo,layout="kk")
p4 <- emapplot(edo, pie_scale=1.5,layout="kk") 
cowplot::plot_grid(p1, p2, p3, p4, ncol=2, labels=LETTERS[1:4])
图片.png

Plot for results obtained from hypergeometric test and gene set enrichment analysis. default (A), pie_scale=1.5 (B), layout="kk" (C) and pie_scale=1.5,layout="kk" (D).

12.6 UnSet plot

可视化基因和基因集的关系;强调不同基因集中基因的重叠

对于over-representation analysis(ORA)分析,upsetplot计算不同基因集合的重叠,并如下展示

install.packages("ggupset")
library(ggupset)
upsetplot(edo)
图片.png

对于GSEA,将绘制不同分类中flod change的分布(例如,通路特异的,不同通路见重叠的)

upsetplot(kk2) # kk2不知道从哪里来的
图片.png

12.7 GSEA结果表达分布的脊线图

ridgeplot对GSEA富集的种类中的核心富集基因的分布进性可视化。帮助解读上调或者下调的基因

install.packages("ggridges")
ridgeplot(edo2)
图片.png

12.8 running score and preranked list of GSEA result

gseaplot2

gseaplot2(edo2, geneSetID = 1, title = edo2$Description[1])

图片.png

gseaplot2 also supports multile gene sets to be displayed on the same figure

gseaplot2(edo2, geneSetID = 1:3)
图片.png

还可以通过pvalue_table参数显示pvalue table

gseaplot2(edo2, geneSetID = 1:3, pvalue_table = TRUE,
          color = c("#E495A5", "#86B875", "#7DB0DD"), ES_geom = "dot")
图片.png

12.9 富集的pathway可视化

pathview用于KEGG pathway 的可视化

library("pathview") ##pathway未安装成功
hsa04110 <- pathview(gene.data  = geneList,
                     pathway.id = "hsa04110",
                     species    = "hsa",
                     limit      = list(gene=max(abs(geneList)), cpd=1))  
图片.png

13 其他有用的工具

13.1 bitr: Biological Id TranslatoR

eg = bitr(x, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")

13.2 setReadable: translating gene IDs to human readable symbols

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,258评论 6 498
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,335评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 162,225评论 0 353
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,126评论 1 292
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,140评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,098评论 1 295
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,018评论 3 417
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,857评论 0 273
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,298评论 1 310
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,518评论 2 332
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,678评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,400评论 5 343
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,993评论 3 325
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,638评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,801评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,661评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,558评论 2 352

推荐阅读更多精彩内容