03--DeconRNASeq:A R package of deconvolution for RNA-seq expression

DeconRNASeq:A Statistical Framework for Deconvolution of Heterogeneous Tissue Samples Based on mRNA-Seq data

Overview of DeconRNASeq

DeconRNASeq package uses nonnegative decomposition algorithm through quadratic programming for estimating the mixing proportions of distinctive tissue types in next generation sequencing data. It requires two R data frame input:

datasets : the raw mRNA expression data matrix ( genes by samples)
datasets = signature *A
signatures : known signatures of specific cell types or tissues (genes by cell types)
A : the cell type concentration matrix(Cell type by samples)

Pipeline of using DeconRNASeq

install deconRNASeq package

source("https://bioconductor.org/biocLite.R")
biocLite("DeconRNASeq")
library(DeconRNASeq)
##view documentation
browseVignettes("DeconRNASeq")

run the example

## multi_tissue: expression profiles for 10 mixing samples from multiple tissues
data(multi_tissue)
datasets <- x.data[,2:11] 
## tissue-specific signatures for different human tissues 
signatures <- x.signature.filtered.optimal[,2:6]
proportions <- fraction
## deconvolution
DeconRNASeq(datasets, signatures, proportions, checksig=FALSE,
              known.prop = TRUE, use.scale = TRUE, fig = TRUE)

datasets:

datasets matrix contains 28745 genes and 10 samples, column name are the sample names, row names are the gene names.

> head(datasets,3)
            reads.1.RPKM reads.2.RPKM reads.3.RPKM reads.4.RPKM reads.5.RPKM
NR_024540      3.6682100      3.78953     8.254980     7.693440     5.637220
NR_028325.1    0.0796274      0.14644     0.104652     0.376109     0.104008
NR_028322.1    0.0796274      0.14644     0.104652     0.376109     0.104008
            reads.6.RPKM reads.7.RPKM reads.8.RPKM reads.9.RPKM reads.10.RPKM
NR_024540       6.358460     5.941820     6.555140     7.784240     5.9895300
NR_028325.1     0.160564     0.188188     0.133709     0.244789     0.0885794
NR_028322.1     0.160564     0.188188     0.133709     0.244789     0.0885794
> dim(datasets)
[1] 28745    10

signatures:

The filter signature data matrix contains 1570 genes for the five tissues. Row names are the gene name, column names are the different tissue (or the cell type) in the mixture.

> head(signatures,3)
                brain    muscle     lung    liver     heart
NR_024540   2.4742600 3.3782600 3.093570 1.279540 0.8652710
NR_028325.1 0.0675838 0.0556031 0.515925 0.085452 0.0830035
NR_028322.1 0.0675838 0.0556031 0.515925 0.085452 0.0830035
> dim(signatures)
[1] 1570    5

proportions:

This data matrix means the proportions of different tissues(different cell types) in samples. Here is the prportions of 5 tissues in 10 sampes.

> head(proportions,3)
              brain muscle   lung  liver  heart
reads.1.RPKM 0.0463 0.0323 0.0805 0.0747 0.7662
reads.2.RPKM 0.0606 0.1156 0.0278 0.6960 0.1000
reads.3.RPKM 0.0728 0.6058 0.1051 0.1262 0.0900
> dim(proportions)
[1] 10  5

03--DeconRNASeq:A R package of deconvolution for RNA-seq expression

Overview of DeconRNASeq

Pipeline of using DeconRNASeq

推荐阅读更多精彩内容