0. open this website
https://www.cbioportal.org/
1.页面如下
首页
2. 点击 Data Sets,输入你感兴趣的 Cancer type,这里我的选择是:Breast Invasive Carcinoma (TCGA, PanCancer Atlas)
搜索你感兴趣的 cancer(这里搜索的是breast)
3.进入你感兴趣的验证数据集后,在数据集名字旁边的下载图标,复制链接,
找到下载图标,复制链接
wget https://cbioportal-datahub.s3.amazonaws.com/brca_tcga_pan_can_atlas_2018.tar.gz
tar -xzvf brca_tcga_pan_can_atlas_2018.tar.gz
解压后的文件
其中 data_mutations.txt 即为 maf 文件,在 RStudio 上把 maf 读取一下,看一下。
library(maftools,lib = "/home/u24211510018/R/x86_64-conda-linux-gnu-library/4.2")
brca_maf <- read.maf(maf = "./data_mutations.txt")
brca_maf@data[1:5,1:10]
brca_maf 一览
4 alternative method:
4.1 使用 R package:TCGAbiolinks 下载,但是 TCGA-BRCA 我下载一直是不成功的,不知道为什么,可能文件太大?anyway, here I just want to provide another way to download data.
library(TCGAbiolinks)
query <- GDCquery(
project = "TCGA-BRCA",
data.category = "Simple Nucleotide Variation",
data.type = "Masked Somatic Mutation",
access = "open"
)
GDCdownload(query)
GDCprepare(query, save = T,save.filename = "TCGA-BRCA_SNP.Rdata")
4.2 通过 GDC 网站,自己选择数据下载,由于每一个patient 都是一个单独的文件,所以较麻烦。
https://portal.gdc.cancer.gov/
Last:
If you found this article helpful, please give it a thumbs-up 👍, share it with others, and follow me for more content like this!