limma是一个很强大的用于分析芯片的R包,也可以用于RNA-Seq的差异分析
以两个组比较为例:首先输入count表达矩阵,这里也跟其他差异分析R包一样,不要输入已经标准化的数据。
本文主要参考:https://www.bioinfo-scrounger.com/archives/115/
library(limma)
library(edge)
counts <- read.csv("raw_counts.csv",row.names = 1)
#Creates a DGEList
dge_counts <- DGEList(counts = counts,remove.zeros = T)
#Calculate normalization factors to scale the raw library sizes.
#用TMM进行标准化
tmm_counts <- calcNormFactors(dge_counts)
#count进行标准化以及转化为log2的值
logCPM_counts <- cpm(tmm_counts , log=TRUE)
制作分组矩阵
#设置分组信息
group_list <- factor(c(rep("control",2), rep("treat",2)))
design <- model.matrix(~group_list)
colnames(design) <- levels(group_list)
rownames(design) <- colnames(counts)
为了避免文库大小在样本间变化的影响,可以使用limma的voom方法进行处理
y = voom(logCPM_counts, design, plot = T)
对voom的描述
Transform count data to log2-counts per million (logCPM), estimate the mean-variance relationship and use this to compute appropriate observation-level weights. The data are then ready for linear modelling.
voom()作用是原始counts转换为logCPM值,将所有计数加0.5,以避免取对数零。然后,将logCPM值矩阵进行标准化。在运行voom()之前,应对counts矩阵进行过滤以除去counts非常低的基因。为此,可以使用edgeR包中的filterByExpr函数。
因为我已经通过edgeR的TMM标准化,所以效果还行,不需要处理,如果效果不好可能由于数据没有过滤好,如下图:
对voom图的解释可以参考这里:
https://stats.stackexchange.com/questions/160255/voom-mean-variance-trend-plot-how-to-interpret-the-plot
差异分析:
#不需要voom转化时:
fit <- lmFit(logCPM_counts, design)
fit <- eBayes(fit)
DE_genes <- topTable(fit, coef = 2,p.value = 0.5, lfc = log2(1.5), number = Inf,sort.by="logFC")
#不进行TMM转化,即不运行calcNormFactors(),直接进行voom转化
y = voom(counts, design, plot = T)
fit <- lmFit(y, design)
fit <- eBayes(fit)
DE_genes <- topTable(fit, coef = 2,p.value = 0.05, lfc = log2(1.5), number = Inf,sort.by="logFC")
欢迎关注~
参考:
https://www.bioinfo-scrounger.com/archives/115/
https://www.jianshu.com/p/616de0ee881a
https://mp.weixin.qq.com/s?__biz=MzI4NjMxOTA3OA==&mid=2247483987&idx=1&sn=aa2ca81e7fe128edaaedc47479c517c9&chksm=ebdf8adadca803cc31261a1ccabf8a6bdb835ce12b670cc01969fcfde51cfdb991d0619e7695&scene=21#wechat_redirect