20210305 16:36
最近在处理single cell 数据,老板要求用DESeq2跑差异基因,Seurat中自带的DESeq2跑出来的有很多是0,很奇怪,也不懂是啥原因,没有往下面细细研究,后来师姐用DESeq2直接跑,就是把每一个细胞当成一个样本来跑的,细胞数目少的时候跑的时间用的少,但是细胞数目多的时候,时间就会用的多。之前都是换一组数据,就得写个代码,很麻烦。今天给它包装了下,跑起来方便多了,很开心。最近的粉丝涨了三个,开心开心,虽然离100的粉丝量还差好多,但是是在进步的就好。
1.函数
#!/~/bin/Rscript
## 第一行是Linux上Rscript上所在位置
# Function: This is a pipeline to find DEGs using DESeq2
# Input: 1.rds odject which includes counts variable and label variable; 2. outputname to save the DEGs
# Output: csv of DESeq2 DEGs
# Version: 2021-03-05, Yiyi Zheng
library(optparse)
library(getopt)
option_list <- list(
make_option(c("-o", "--output"), type = "character", default = FALSE,
action = "store", help = "This is the output directory."
),
make_option(c("--rds_name"), type = "character", default = FALSE,
action = "store", help = "This is the name of rds to input."
),
make_option(c("--output_name"), type = "character", default = FALSE,
action = "store", help = "This is the name of ouput file."
)
)
# -help
opt = parse_args(OptionParser(option_list = option_list,
usage = "This is a script to find DEGs using DESeq2"))
print(opt)
# set the output path
library(DESeq2)
DESeq2_Object <- readRDS(opt$rds_name)
setwd(opt$output)
timestart<-Sys.time()
print("The time begining:")
print(timestart)
## Main function
print("The level and number of input data:")
# The levels of condition need to assign by yourself. DESeq2 will use the first level as the condtion.
#condition<-factor(c(rep("Control",7104), rep("Obesity",6965)), levels = c("Control","Obesity"))
condition <- factor(DESeq2_Object$label)
print(levels(condition))
print(table(condition))
print("The contrl is")
print(table(condition)[1])
print("The treat is")
print(table(condition)[2])
count <- DESeq2_Object$counts
count <- as.matrix(count) # Need to transform to matrix.
coldata <- data.frame(row.names=colnames(count), condition)
count_2 <- count[which(rowSums(count) > 0), ] # Exclude the gene which is 0 in all cells.
count <- count_2+1
dds <- DESeqDataSetFromMatrix(count,coldata,design=~condition)
dds <- DESeq(dds)
res <- results(dds)
outputname <- opt$output_name
print("Finished to calculate DEGs and now writing the DEGs results.")
write.csv(res, outputname)
timeend<-Sys.time()
print("The time ending:")
print(timeend)
runningtime<-timeend-timestart
print("The time DESeq2 used is ")
print(runningtime)
2.使用
该函数的使用就是在Linux上直接输入参数就可以了,但是之前你需要构建一个rds object, 其中两个变量是counts和label,构建如下:
rm(list = ls())
list.files()
library(Seurat)
Female_exp <- readRDS("20210305_Female_DESeq2_Exp.rds")
all <- list()
all$counts <- Female_exp$counts
all$label <- Female_exp$sample_label
print(table(all$label))
## DESeq2中会把第一个level 作为control,第二个level作为treat.这个需要确认好,是谁对谁找的差异基因
saveRDS(all , "Female_allcells.rds")
然后就可以挂在后台上跑了
i=Female_allcells.rds
nohup RunDESeq2_Linux.R -o . --rds_name $i --output_name $i.csv > $i.output.file 2>&1 &
进一步有一步的欢喜!!
今天是前进的一天