需求
1.快捷查找ID对应的description,知道通路对应的编号是多少。
2.找出某一个/几个通路里的全部基因,用来做单独的下游分析。
如果是要做KEGG的富集分析,clusterProfiler可以搞定:https://bioconductor.org/packages/release/bioc/vignettes/clusterProfiler/inst/doc/clusterProfiler.html
想看kegg通路图的话,用R包pathview来看,看函数的帮助文档就行。
1.找通路ID与description的对应关系
1.1网站搜索
不批量找的话,直接网站搜最简单 https://www.genome.jp/kegg/kegg2.html
1.2.借助msigdbr
需要找全部的对应关系,基于前面讲的msigdbr可以完成:https://www.jianshu.com/p/0098baf2df46
msigdb里面本来就包括了kegg,而且挺齐全的,ID,description,基因,全都有啦。
library(msigdbr)
KEGG_df = msigdbr(species = "Homo sapiens",category = "C2",subcategory = "CP:KEGG") %>%
dplyr::select(gs_exact_source,gene_symbol,gs_description)
head(KEGG_df)
## # A tibble: 6 x 3
## gs_exact_source gene_symbol gs_description
## <chr> <chr> <chr>
## 1 hsa02010 ABCA1 ABC transporters
## 2 hsa02010 ABCA10 ABC transporters
## 3 hsa02010 ABCA12 ABC transporters
## 4 hsa02010 ABCA13 ABC transporters
## 5 hsa02010 ABCA2 ABC transporters
## 6 hsa02010 ABCA3 ABC transporters
kegg1 = split(KEGG_df$gene_symbol,KEGG_df$gs_exact_source)
lapply(kegg1[1:6],head)
## $hsa00010
## [1] "ACSS1" "ACSS2" "ADH1A" "ADH1B" "ADH1C" "ADH4"
##
## $hsa00020
## [1] "ACLY" "ACO1" "ACO2" "CS" "DLAT" "DLD"
##
## $hsa00030
## [1] "ALDOA" "ALDOB" "ALDOC" "DERA" "FBP1" "FBP2"
##
## $hsa00040
## [1] "AKR1B1" "CRYL1" "DCXR" "DHDH" "GUSB" "RPE"
##
## $hsa00051
## [1] "AKR1B1" "AKR1B10" "ALDOA" "ALDOB" "ALDOC" "FBP1"
##
## $hsa00052
## [1] "AKR1B1" "B4GALT1" "B4GALT2" "G6PC" "G6PC2" "GAA"
2.通路ID与基因之间的对应关系
在org.Hs.eg.db包里有:
library(clusterProfiler)
library(org.Hs.eg.db)
kegg <- org.Hs.egPATH2EG
mapped <- mappedkeys(kegg)
kegg2 <- as.list(kegg[mapped])
lapply(kegg2[1:6],head)
## $`04610`
## [1] "2" "462" "623" "624" "629" "710"
##
## $`00232`
## [1] "9" "10" "1544" "1548" "1549" "1553"
##
## $`00983`
## [1] "9" "10" "978" "1066" "1548" "1549"
##
## $`01100`
## [1] "9" "10" "15" "18" "28" "30"
##
## $`00380`
## [1] "15" "26" "38" "39" "217" "219"
##
## $`00970`
## [1] "16" "833" "1615" "2058" "2193" "2617"
看起来像一堆密码?这个列表,名字是通路的id,只是省略了hsa,内容是基因的entrizid。
举个栗子,提取hsa03030里的基因,并且转换成symbol。
genes = unlist(kegg2["03030"])
length(genes)
## [1] 36
#想让他变成symbol直接bitr即可
genes = bitr(genes,
fromType = "ENTREZID",
toType = "SYMBOL",
OrgDb = "org.Hs.eg.db")$SYMBOL
head(genes)
## [1] "DNA2" "FEN1" "LIG1" "MCM2" "MCM3" "MCM4"