GISTIC2本地版用起来难度大,建议使用在线版本:https://cloud.genepattern.org/gp/pages/login.jsf;
具体方法参考:https://www.jianshu.com/p/4312a453b4a4?clicktime=1578478942
但特别注意的是,这个作者准备输入数据的代码有误,错误原因在于把样本名16个字符以后的删除之后,会导致不同的样本有同样的样本名,结果就是导致GISTIC2报错,说有overlap;只要保持样本名不变即可,具体代码,如下,仅供参考:
library(dplyr)
library(TCGAbiolinks)
query <- GDCquery(project = "TCGA-GBM",
data.category = "Copy Number Variation",
data.type = "Masked Copy Number Segment")
GDCdownload(query, method = "api", files.per.chunk = 100)
segment_dat <- GDCprepare(query = query)
head(segment_dat)
segment_dat$GDC_Aliquot <- segment_dat$Sample
segment_dat$Sample <- substring(segment_dat$Sample,1,16)
head(segment_dat)
segment_dat <- grep("01A$",segment_dat$Sample) %>% segment_dat[.,]
#segment_dat[,1] <- segment_dat$Sample原来有错误的地方,千万不能这么写
segment_dat <- segment_dat[,-7]
write.table(segment_dat,"MaskedCopyNumberSegment-2.txt",sep="\t",
quote = F,col.names = F,row.names = F)