The Module Score calculation is designed to refine the mean expression of a gene signature by accounting for potential batch effects and other sources of variation. While the simplest form of Module Score could be the average expression of all genes in a given signature, this approach may be skewed by extraneous variables. To mitigate this, the Module Score incorporates a background reference, which serves as an adjustment factor.
The adjustment factor is derived from the overall expression matrix. Specifically, for each gene j within the signature, a set of background genes is selected based on the gene j's mean expression level. This selection process is guided by the parameter "ctrl" within the function, which controls the number of background genes to be chosen.
To elaborate, the method involves calculating the mean expression of all genes across the matrix and then dividing them into several bins (using function cut_number), with the number of bins determined by the parameter "bins". For each gene j in the signature, "ctrl" number of genes (defaulting to 100) from the same bin are selected as background genes. This process is repeated for each gene in the signature, ultimately yielding a control gene set, denoted as "ctrl.use[[i]]", where "i" represents a specific signature.
Using this control gene set, we calculate the mean expression for each cell, which gives us the adjustment factor for each cell. The Module Score for signature i is then computed as:
This adjusted Module Score provides a more accurate representation of the gene signature's expression, free from confounding effects, and is particularly useful for comparing expression profiles across different cell types or conditions.
The most important parts of orignal codes is shown below:
pool <- pool %||% rownames(x = object)
data.avg <- Matrix::rowMeans(x = assay.data[pool, ])
data.avg <- data.avg[order(data.avg)]
data.cut <- cut_number(x = data.avg + rnorm(n = length(data.avg))/1e30, n = nbin, labels = FALSE, right = FALSE)
#data.cut <- as.numeric(x = Hmisc::cut2(x = data.avg, m = round(x = length(x = data.avg) / (nbin + 1))))
names(x = data.cut) <- names(x = data.avg)
ctrl.use <- vector(mode = "list", length = cluster.length)
for (i in 1:cluster.length) {
features.use <- features[[i]]
for (j in 1:length(x = features.use)) {
ctrl.use[[i]] <- c(
ctrl.use[[i]],
names(x = sample(
x = data.cut[which(x = data.cut == data.cut[features.use[j]])],
size = ctrl,
replace = FALSE
))
)
}
}