DeconRNASeq:A Statistical Framework for Deconvolution of Heterogeneous Tissue Samples Based on mRNA-Seq data
Overview of DeconRNASeq
DeconRNASeq package uses nonnegative decomposition algorithm through quadratic programming for estimating the mixing proportions of distinctive tissue types in next generation sequencing data. It requires two R data frame input:
- datasets : the raw mRNA expression data matrix ( genes by samples)
datasets = signature *A - signatures : known signatures of specific cell types or tissues (genes by cell types)
- A : the cell type concentration matrix(Cell type by samples)
Pipeline of using DeconRNASeq
-
install deconRNASeq package
source("https://bioconductor.org/biocLite.R") biocLite("DeconRNASeq") library(DeconRNASeq) ##view documentation browseVignettes("DeconRNASeq")
-
run the example
## multi_tissue: expression profiles for 10 mixing samples from multiple tissues data(multi_tissue) datasets <- x.data[,2:11] ## tissue-specific signatures for different human tissues signatures <- x.signature.filtered.optimal[,2:6] proportions <- fraction ## deconvolution DeconRNASeq(datasets, signatures, proportions, checksig=FALSE, known.prop = TRUE, use.scale = TRUE, fig = TRUE)
-
datasets:
datasets matrix contains 28745 genes and 10 samples, column name are the sample names, row names are the gene names.
> head(datasets,3) reads.1.RPKM reads.2.RPKM reads.3.RPKM reads.4.RPKM reads.5.RPKM NR_024540 3.6682100 3.78953 8.254980 7.693440 5.637220 NR_028325.1 0.0796274 0.14644 0.104652 0.376109 0.104008 NR_028322.1 0.0796274 0.14644 0.104652 0.376109 0.104008 reads.6.RPKM reads.7.RPKM reads.8.RPKM reads.9.RPKM reads.10.RPKM NR_024540 6.358460 5.941820 6.555140 7.784240 5.9895300 NR_028325.1 0.160564 0.188188 0.133709 0.244789 0.0885794 NR_028322.1 0.160564 0.188188 0.133709 0.244789 0.0885794 > dim(datasets) [1] 28745 10
-
signatures:
The filter signature data matrix contains 1570 genes for the five tissues. Row names are the gene name, column names are the different tissue (or the cell type) in the mixture.
> head(signatures,3) brain muscle lung liver heart NR_024540 2.4742600 3.3782600 3.093570 1.279540 0.8652710 NR_028325.1 0.0675838 0.0556031 0.515925 0.085452 0.0830035 NR_028322.1 0.0675838 0.0556031 0.515925 0.085452 0.0830035 > dim(signatures) [1] 1570 5
-
proportions:
This data matrix means the proportions of different tissues(different cell types) in samples. Here is the prportions of 5 tissues in 10 sampes.
> head(proportions,3) brain muscle lung liver heart reads.1.RPKM 0.0463 0.0323 0.0805 0.0747 0.7662 reads.2.RPKM 0.0606 0.1156 0.0278 0.6960 0.1000 reads.3.RPKM 0.0728 0.6058 0.1051 0.1262 0.0900 > dim(proportions) [1] 10 5
-