03--DeconRNASeq:A R package of deconvolution for RNA-seq expression

DeconRNASeq:A Statistical Framework for Deconvolution of Heterogeneous Tissue Samples Based on mRNA-Seq data

Overview of DeconRNASeq

DeconRNASeq package uses nonnegative decomposition algorithm through quadratic programming for estimating the mixing proportions of distinctive tissue types in next generation sequencing data. It requires two R data frame input:

  • datasets : the raw mRNA expression data matrix ( genes by samples)
    datasets = signature *A
  • signatures : known signatures of specific cell types or tissues (genes by cell types)
  • A : the cell type concentration matrix(Cell type by samples)

Pipeline of using DeconRNASeq

  • install deconRNASeq package

    source("https://bioconductor.org/biocLite.R")
    biocLite("DeconRNASeq")
    library(DeconRNASeq)
    ##view documentation
    browseVignettes("DeconRNASeq")
    
  • run the example

    ## multi_tissue: expression profiles for 10 mixing samples from multiple tissues
    data(multi_tissue)
    datasets <- x.data[,2:11] 
    ## tissue-specific signatures for different human tissues 
    signatures <- x.signature.filtered.optimal[,2:6]
    proportions <- fraction
    ## deconvolution
    DeconRNASeq(datasets, signatures, proportions, checksig=FALSE,
                  known.prop = TRUE, use.scale = TRUE, fig = TRUE)
    
    • datasets:

      datasets matrix contains 28745 genes and 10 samples, column name are the sample names, row names are the gene names.

      > head(datasets,3)
                  reads.1.RPKM reads.2.RPKM reads.3.RPKM reads.4.RPKM reads.5.RPKM
      NR_024540      3.6682100      3.78953     8.254980     7.693440     5.637220
      NR_028325.1    0.0796274      0.14644     0.104652     0.376109     0.104008
      NR_028322.1    0.0796274      0.14644     0.104652     0.376109     0.104008
                  reads.6.RPKM reads.7.RPKM reads.8.RPKM reads.9.RPKM reads.10.RPKM
      NR_024540       6.358460     5.941820     6.555140     7.784240     5.9895300
      NR_028325.1     0.160564     0.188188     0.133709     0.244789     0.0885794
      NR_028322.1     0.160564     0.188188     0.133709     0.244789     0.0885794
      > dim(datasets)
      [1] 28745    10
      
    • signatures:

      The filter signature data matrix contains 1570 genes for the five tissues. Row names are the gene name, column names are the different tissue (or the cell type) in the mixture.

      > head(signatures,3)
                      brain    muscle     lung    liver     heart
      NR_024540   2.4742600 3.3782600 3.093570 1.279540 0.8652710
      NR_028325.1 0.0675838 0.0556031 0.515925 0.085452 0.0830035
      NR_028322.1 0.0675838 0.0556031 0.515925 0.085452 0.0830035
      > dim(signatures)
      [1] 1570    5
      
    • proportions:

      This data matrix means the proportions of different tissues(different cell types) in samples. Here is the prportions of 5 tissues in 10 sampes.

      > head(proportions,3)
                    brain muscle   lung  liver  heart
      reads.1.RPKM 0.0463 0.0323 0.0805 0.0747 0.7662
      reads.2.RPKM 0.0606 0.1156 0.0278 0.6960 0.1000
      reads.3.RPKM 0.0728 0.6058 0.1051 0.1262 0.0900
      > dim(proportions)
      [1] 10  5
      

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • ①你就是我存在的唯一理由 一个月之前的“你”如此地单纯可爱,回顾往日时光那不堪回首的家庭岁月。 生长农村田野里长大...
    小曾巩阅读 203评论 0 0
  • 每年都会感冒1-2次 一般在节气交替 或者心情抑郁到无法排泄时 它就来了。 今年清明后 感冒了!一开始 喉咙干涩 ...
    Vivian_dh阅读 333评论 0 0
  • 一:課前提問:你喜歡背誦嗎? 答:有人喜歡,有人不喜歡,為什麼?我覺得,因為有自信,有成就感的人,會喜歡背誦。反之...
    huanghoipan阅读 336评论 0 2