1. 首先查看KEGG数据库 PI3K-AKT signaling pathway gene set
详细说明查看如何拿到 KEGG数据库的 hsa04650 Natural killer cell mediated cytotoxicity这个通路的所有基因名字
library(KEGGREST)
listDatabases()#显示KEGGREST所包含的数据内容, 可以在进一步查询中使用这些数据。
org <- keggList("organism")
head(org)
gs<-keggGet('hsa04151')
names(gs[[1]]) # 说明书里发现的哈
kegggenes <- unlist(lapply(gs[[1]]$GENE,function(x) strsplit(x,';')[[1]][1]))[1:length(genes)%%2 ==1]
kegggenes
png <- keggGet("hsa04151", "image")
t <- tempfile()
library(png)
writePNG(png, t)
if (interactive()) browseURL(t)
2. 其次查看reactome数据库 PI3K-AKT signaling pathway gene set
reactome数据库网址:
https://reactome.org/documentation
输入pi3k/akt检索得到:
发现6条信号通路与PI3K/AKT存在关系,我选取了198203/199418/2219528三条,采用reactome.db包进行提取。
## 软件包含注释包,615.9MB好大的包包
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("reactome.db")
library(reactome.db)
ls("package:reactome.db")
keytypes(reactome.db)
#看此物件中的資料之欄位名稱
columns(reactome.db)
#直接读取特定key种类的值
keys(reactome.db, keys ="PATHNAME")
#最后使用keys來query此annotation database
AnnotationDbi::select(reactome.db, keys = c("6794"), columns = c("PATHID","PATHNAME"), keytypes="ENTREZID") ## 查看单个基因所在通路
a<- as.list(reactomePATHID2EXTID)$ "R-HSA-198203"
b<- as.list(reactomePATHID2EXTID)$ "R-HSA-199418"
c<- as.list(reactomePATHID2EXTID)$ "R-HSA-2219528"
reagenes <-union(c(a,b), c) ## 取并集
3. 查看交集
intersect(kegggenes, reagenes)
##[1] "1950" "2069" "2246" "2247" "2248" "2249" "8822" "2251" "2252" "2253" "2254" "2255"
##[13] "8823" "2250" "8817" "26281" "27006" "9965" "8074" "4803" "3630" "5154" "5155" "4254"
##[25] "3082" "1956" "2064" "2065" "2066" "2260" "2263" "2261" "2264" "4914" "3643" "5156"
##[37] "5159" "3815" "4233" "2885" "5594" "5595" "3667" "5879" "930" "118788" "5290" "5293"
##[49] "5291" "5295" "5296" "8503" "5170" "7249" "64223" "2475" "6199" "207" "208" "10000"
##[61] "5728" "117145" "5515" "5516" "5519" "5518" "5526" "5527" "5528" "5529" "5525" "23239"
##[73] "23035" "2932" "1026" "1027" "2309" "572" "842" "1385" "3164" "1147" "4193"
setdiff(kegggenes, reagenes) ## 取kegg数据库中特有元素
etdiff(reagenes, kegggenes) ## 取ReactomeDB数据库中特有元素
##[1] "387" "8660" "10718" "10818" "145957" "152831" "1839" "2099" "2100" "23396" "2534" "2549"
##[13] "29851" "3084" "3556" "3654" "391" "3932" "4615" "50852" "51135" "5305" "57761" "5781"
##[25] "5880" "6714" "685" "7189" "7409" "79837" "8394" "8395" "8396" "8870" "90865" "9173"
##[37] "9365" "940" "941" "942" "9542" "2308" "253260" "2931" "4303" "55615" "79109" "84335"
基因Id转换
library( "clusterProfiler" )
library( "org.Hs.eg.db" )
df <- bitr( intersect(kegggenes, reagenes), fromType = "ENTREZID", toType = c( "SYMBOL" ), OrgDb = org.Hs.eg.db )
head( df )
## ENTREZID SYMBOL
## 1 1950 EGF
## 2 2069 EREG
## 3 2246 FGF1
## 4 2247 FGF2
## 5 2248 FGF3
## 6 2249 FGF4
从以上可以看到kegg数据库 PI3K-AKT signaling pathway gene set 中基因数量更多一些,但是reactome数据库 PI3K-AKT signaling pathway gene set 中是已经按照信号通路分类的,功能方面更具体。
参考文献:
- 信号通路查询,除了KEGG你还知道什么?
- 推荐一种简单全能的富集分析工具
- kegg富集分析之:KEGGREST包(9大功能)
- KEGG数据库介绍
- Pathview: An R package for pathway based data integration and visualization
- The Pathway Browser
- 理解Bioconductor系列(二):AnnotationDbi,決定annotation database的基本結構
全国巡讲第9、10站-武汉和成都(生信技能树爆款入门课)
1.3个学生的linux视频学习笔记
2.生信人应该这样学R语言系列视频学习心得笔记分享
3.一万人陪你学习GEO数据库挖掘知识(公益视频听课笔4.记分享)
4.公共数据库挖掘视频学习心得体会
5.生信小技巧系列第一季完结版视频教程学习笔记分享
6.人类全外显子测序数据分析视频教程学习笔记
7.B站的11套生物信息学公益视频配套讲义,练习题及思维导图第一弹
8.转录组测序数据分析公益视频学习笔记分享