众所周知,今年TCGA数据库更新了一波,原来的HT-Counts现在变成了STAR-Counts。TCGABiolinks包的下载流程也发生了一些小小的变化。这里重新梳理一下TCGABiolinks的下载流程,供大家参考
一、加载R包
library(TCGAbiolinks)
library(SummarizedExperiment)
主要的R包主要是这么几个,其中SummarizedExperiment是为了提取不同类型(Counts/TPM……)的数据的。
二、下载数据
首先来查看一下TCGAbiolinks可以下载的数据类型
> getGDCprojects()$project_id
[1] "EXCEPTIONAL_RESPONDERS-ER" "GENIE-GRCC"
[3] "GENIE-DFCI" "GENIE-NKI"
[5] "GENIE-VICC" "GENIE-UHN"
[7] "GENIE-MDA" "GENIE-MSK"
[9] "GENIE-JHU" "FM-AD"
[11] "OHSU-CNL" "MMRF-COMMPASS"
[13] "ORGANOID-PANCREATIC" "NCICCR-DLBCL"
[15] "VAREPOP-APOLLO" "CGCI-BLGSP"
[17] "BEATAML1.0-CRENOLANIB" "TRIO-CRU"
[19] "REBC-THYR" "TARGET-ALL-P2"
[21] "TARGET-ALL-P1" "CPTAC-2"
[23] "WCDT-MCRPC" "CMI-ASC"
[25] "TCGA-READ" "TCGA-UCS"
[27] "CMI-MPC" "CMI-MBC"
[29] "BEATAML1.0-COHORT" "TCGA-COAD"
[31] "TCGA-CESC" "TCGA-PAAD"
[33] "TCGA-ESCA" "TCGA-KIRP"
[35] "TCGA-PCPG" "TCGA-HNSC"
[37] "TCGA-BLCA" "TCGA-STAD"
[39] "CTSP-DLBCL1" "TCGA-SARC"
[41] "TCGA-CHOL" "TCGA-LAML"
[43] "TCGA-THYM" "TCGA-ACC"
[45] "TCGA-SKCM" "TCGA-LUAD"
[47] "TCGA-LIHC" "TCGA-KIRC"
[49] "TCGA-KICH" "TCGA-DLBC"
[51] "TCGA-PRAD" "TCGA-OV"
[53] "TCGA-MESO" "TCGA-LUSC"
[55] "TCGA-GBM" "TCGA-UVM"
[57] "TCGA-LGG" "HCMI-CMDC"
[59] "TCGA-BRCA" "TARGET-RT"
[61] "TARGET-CCSK" "TCGA-TGCT"
[63] "TARGET-NBL" "CPTAC-3"
[65] "CGCI-HTMCP-CC" "TARGET-ALL-P3"
[67] "TARGET-OS" "TARGET-AML"
[69] "TARGET-WT" "MP2PRT-WT"
[71] "TCGA-THCA" "TCGA-UCEC"
这里以结肠癌为例进行演示
COAD <- GDCquery(project = "TCGA-COAD",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "STAR - Counts")
GDCdownload(COAD,method="api")
workflow.type这个参数,不管要下载的是TPM还是FPKM,都填STAR-Counts。不同类型的数据到后面再说。
经过漫长的等待数据终于下载下来了。文件默认存储在当前的工作目录下的GDCdata文件夹,当然也可以在GDCdownload函数里通过directory参数进行更改。
三、合并数据和提取数据
expr <- GDCprepare(query=COAD)
通过这条命令可以把上面下载到的数据整合成1个summarizedExperiment对象。
如果需要counts数据,可以直接从这个对象里提取
count <- as.data.frame(assay(expr))
如果需要counts格式以外的其他数据,则需要在这一步改一下参数
TPM <- as.data.frame(assay(expr,i = "tpm_unstrand"))
提取不同格式数据需要的参数在下面:
下载Counts i= "unstranded"
下载tpm i="tpm_unstrand"
下载fpkm i=" fpkm_unstrand"