文章仅是记录自己的学习使用,有错误请指出,我立刻改正!
更多说明:
https://cloud.tencent.com/developer/article/1078313
https://www.zhihu.com/question/296543595
https://www.sohu.com/a/332185306_278730
变异系数(CV,Coefficient of Variance):又称离散系数,是概率分布离散程度的一个归一化量度,其定义为标准差与平均数的比值。
一、计算公式
CV = (SD ÷ MN) × 100%
# 标准偏差SD
# 平均值MN
二、数据准备
Geneid CK treat1 treat2
Pop_A01G39 0 0 0
Pop_A01G40 0 0 1
Pop_A01G41 7 12 6
Pop_A01G42 0 0 0
Pop_A01G43 0 0 0
Pop_A01G44 264 242 221
Pop_A01G45 0 3 0
Pop_A01G46 104 73 111
Pop_A01G47 182 230 225
Pop_A01G48 51 57 38
# 第一列为基因名,后面是不同样本中的表达量值
三、CV计算(R包)
library(tidyverse)
library(matrixStats)
infile = 'counts.txt'
workdir = 'C:\\Users\\Amy\\Desktop'
setwd(workdir)
data = read.csv(infile, header=T, row.names=1, sep='\t')
head(data)
dim(data)
filter_data <- data[rowSums(data) != 0,] # 剔除在所有样品中表达量都为0的基因
dim(filter_data)
result_data<- filter_data %>%
mutate(mean=rowMeans(filter_data), sd=rowSds(as.matrix(filter_data))) %>%
mutate(CV = sd/mean)
# R中rowSds计算的是样本标准差,excel中计算的是总体标准差
write.table(result_data,file='gene_CV.txt', quote = F, sep = '\t')