前言
Immugent在之前的一篇推文;NetAct:利用基因活性构建核心转录因子调控网络中,介绍了NetAct的整体框架。简单来说,NetAct是一个新的转录因子分析软件,其使用转录组数据和基于文献的转录因子-靶标数据库来构建核心转录因子调控网络。今天就通过代码实操的方式展示如何使用NetAct来对我们的单细胞数据进行转录因子分析。
本篇推文旨在演示NetAct的核心功能的分析流程,以及如何使用它来构建转录因子调控网络并建立模型。该工作流程中使用的实验RNA-seq数据来自Sheridan等人(2015),由三种细胞群,包含basal细胞、luminal progenitor(LP)和mature luminal(ML)组成,这些细胞群来自雌性未交配小鼠的乳腺。使用GEO系列登录号GSE63310,这些样本的计数数据可以从GEO数据库上下载。
代码实操
首先我们需要下载包含每个样本原始基因水平计数的文本文件,示例文件GSE63310_RAW.tar可在http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE63310&format=file上下载。
library(NetAct)
url <- "https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE63310&format=file"
utils::download.file(url, destfile="GSE63310_RAW.tar", mode="wb")
utils::untar("GSE63310_RAW.tar", exdir = ".")
files <- c("GSM1545535_10_6_5_11.txt", "GSM1545536_9_6_5_11.txt", "GSM1545538_purep53.txt",
"GSM1545539_JMS8-2.txt", "GSM1545540_JMS8-3.txt", "GSM1545541_JMS8-4.txt",
"GSM1545542_JMS8-5.txt", "GSM1545544_JMS9-P7c.txt", "GSM1545545_JMS9-P8c.txt")
for(i in paste(files, ".gz", sep="")){
R.utils::gunzip(i, overwrite=TRUE)
}
files <- c("GSM1545535_10_6_5_11.txt", "GSM1545536_9_6_5_11.txt",
"GSM1545538_purep53.txt", "GSM1545539_JMS8-2.txt",
"GSM1545540_JMS8-3.txt", "GSM1545541_JMS8-4.txt",
"GSM1545542_JMS8-5.txt", "GSM1545544_JMS9-P7c.txt",
"GSM1545545_JMS9-P8c.txt")
x <- edgeR::readDGE(files, columns=c(1,3))
group <- as.factor(c("LP", "ML", "Basal", "Basal", "ML", "LP", "Basal", "ML", "LP"))
x$samples$group <- group
samplenames <- c("LP1", "ML1", "Basal1", "Basal2", "ML2", "LP2", "Basal3", "ML3", "LP3")
colnames(x) <- samplenames
Set up comparisons and phenodata
compList <- c("Basal-LP", "Basal-ML", "LP-ML")
phenoData = new("AnnotatedDataFrame", data = data.frame(celltype = group))
rownames(phenoData) = colnames(x$counts)
Preprocess and DEG analysis
counts <- Preprocess_counts(counts = x$counts, groups = group, mouse = TRUE)
DErslt = RNAseqDegs_limma(counts = counts, phenodata = phenoData,
complist = compList, qval = 0.05)
neweset = Biobase::ExpressionSet(assayData = as.matrix(DErslt$Overall$e), phenoData = phenoData)
Identify significantly enriched TFs(最关键的一步)
data("mDB")
calc <- FALSE
if (calc) {
gsearslts <- TF_Selection(GSDB = mDB, DErslt = DErslt, minSize = 5, nperm = 10000,
qval = 0.05, compList = compList,
nameFile = "gsearslts_tutorial")
} else {
gsearslts <- readRDS(file = "gsearslts_tutorial.RDS")
}
tfs <- gsearslts$tfs
tfs
Reselect_TFs(GSEArslt = gsearslts$GSEArslt, qval = 0.01)
Calculate the activities of the selected TFs
act.me <- TF_Activity(tfs, mDB, neweset, DErslt$Overall)
acts_mat = act.me$all_activities
Activity_heatmap(acts_mat, neweset)
Construct gene regualtory network
tf_links = TF_Filter(acts_mat, mDB, miTh = .05, nbins = 8, corMethod = "spearman", DPI = T)
plot_network(tf_links)
RACIPE simulations
racipe_results <- sRACIPE::sracipeSimulate(circuit = tf_links, numModels = 200, plots = TRUE)
说在最后
在NetAct之前,对单细胞数据进行转录因子分析的软件我们听到最多的都是SCENIC,以及它的python版本--pySCENIC。但是这两个软件使用起来都很不方便,而且没有具体展示各基因(转录因子)之间的调控关系。而NetAct的出现,使我们彻底告别SCENIC的垄断时代,它不仅轻便而且能展示更多的转录因子调控关系,感兴趣的小伙伴赶紧实操起来吧。
好啦,本期分享到这就结束了,我们下期再会~~