1. GENIE3对于输入的expression matrix,既可以是UMI counts,也可以是library-size normalized counts,两者的结果相近。
SCENIC: single-cell regulatory network inference and clustering
To evaluate to what extent the normalization of the input matrix affects the output of SCENIC, we also ran SCENIC on the Zeisel et al.9 data set after library-size normalization (using the standard pipeline from scran27, which performs within-cluster size-factor normalization). The results are highly comparable, both in regards to resulting clusters or cell types (ARI between the cell types obtained from raw UMI counts or normalized counts: 0.90, ARI from normalized counts compared to the author's cell types: 0.87) and to the TFs identifying the groups (26 out of the 30 regulons highlighted in Fig. 1b). Furthermore, during the course of this project we have applied GENIE3 to multiple data sets, some of them having UMI counts (e.g., mouse brain and oligodendrocytes) and others TPM (e.g., human brain and melanoma), and both units provided reliable results.
2. SCENIC详细流程:
Running SCENIC (htmlpreview.github.io)
其中:
## If launched in a new session, you will need to reload...
# setwd("...")
# loomPath <- "..."
# loom <- open_loom(loomPath)
# exprMat <- get_dgem(loom)
# close_loom(loom)
# genesKept <- loadInt(scenicOptions, "genesKept")
# exprMat_filtered <- exprMat[genesKept,]
# library(SCENIC)
# scenicOptions <- readRDS("int/scenicOptions.Rds")
# Optional: add log (if it is not logged/normalized already)
exprMat_filtered <- log2(exprMat_filtered+1)
# Run GENIE3
runGenie3(exprMat_filtered, scenicOptions)
似乎是使用的normalized counts。
3. SCENIC不检测抑制性regulons
SCENIC: single-cell regulatory network inference and clustering
To build the final regulons, we merge the predicted target genes of each TF module that show enrichment of any motif of the given TF. To detect repression, it is theoretically possible to follow the same approach with the negative-correlated TF modules. However, in the data sets we analyzed, these modules were less numerous and showed very low motif enrichment. For this reason, we finally decided to exclude the detection of direct repression from the workflow and continue only with the positive-correlated targets. The databases used for the analyses presented in this paper are the “18k motif collection” from iRegulon (gene-based motif rankings) for human and mouse. For each species, we used two gene-motif rankings (10 kb around the TSS or 500 bp upstream the TSS), which determine the search space around the transcTSS.
4. pySCENIC的输出:reg.csv文件包含regulon及其target genes结果。reg.csv每一行代表一个motif及对应的target genes。一个regulon可能对应多个motif。SCENIC流程中将所有motif的target genes做并集,然后用AUCell计算评分。
How to get the list of target genes for one regulon from the output regulon.csv file of ctx · Issue #301 · aertslab/pySCENIC (github.com)
SCENIC: single-cell regulatory network inference and clustering