肿瘤突变负荷(Tumor Mutational Burden, TMB),指肿瘤细胞基因组中,基因的外显子编码区每兆碱基中发生置换和插入或缺失突变的总数。从理论上来说,高TMB(High Tumor Mutational Burden, TMB-H)的肿瘤患者,具有获得更多新生抗原的潜力,并且与肿瘤内异质性有关,能增强肿瘤免疫原性以及与ICI的反应。
要计算样本的TMB值,必须先获取样本的突变数据,TCGA 的突变流程有 4 种,分别是:muse, varscan2, somaticsniper, mutect2。其中 muse 和 somaticsniper 只能计算点突变,无法识别 indel。而目前新版的TCGA不能直接下载4种制作好的maf文件,可通过 "TCGAbiolinks" 整理
1. 使用包整理突变数据
1.1 MAF文件的下载
# 安装并加载所需的R包
library(TCGAbiolinks)
query <- GDCquery(
project = "TCGA-STAD",
data.category = "Simple Nucleotide Variation",
data.type = "Masked Somatic Mutation",
access = "open"
)
GDCdownload(query)
GDCprepare(query, save = T,save.filename = "TCGA-STAD_SNP.Rdata") # 这里的Rdata文件是一个数据框,可直接用maftools读取使用
1.2 maftools读取处理MAF文件
library(maftools)
load(file = "TCGA-COAD_SNP.Rdata")
maf.stad <- data
## 查看数据
class(maf.stad)
# [1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
dim(maf.stad)
# [1] 183107 140
maf.stad[1:5, 1:10]
# # A tibble: 5 × 10
# Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position End_Position Strand Variant_Classification Variant_Type
# <chr> <int> <chr> <chr> <chr> <int> <int> <chr> <chr> <chr>
# 1 NEXN 91624 BI GRCh38 chr1 77929416 77929416 + Missense_Mutation SNP
# 2 COL24A1 255631 BI GRCh38 chr1 86125858 86125858 + Missense_Mutation SNP
# 3 LRRC8B 23507 BI GRCh38 chr1 89593023 89593023 + Frame_Shift_Del DEL
# 4 BPNT1 10380 BI GRCh38 chr1 220069429 220069429 + Missense_Mutation SNP
# 5 KCNF1 3754 BI GRCh38 chr2 10913244 10913244 + Missense_Mutation SNP
maf <- read.maf(maf.stad)
# -Validating
# -Silent variants: 45460
# -Summarizing
# --Possible FLAGS among top ten genes:
# TTN
# MUC16
# SYNE1
# FLG
# -Processing clinical data
# --Missing clinical data
# -Finished in 8.084s elapsed (23.5s cpu)
1.3 可视化
# 绘制MAF文件的整体结果图
plotmafSummary(maf = maf, rmOutlier = TRUE, addStat = 'median', dashboard = TRUE)
★ 错义突变和移码插入突变较多;C>T 的点突变类型是最多的;TTN 和 MUC16 两个基因的突变次数最多
# 绘制oncoplot图(只展示前 15 个基因)
vc_cols <- RColorBrewer::brewer.pal(n = 8, name = 'Paired')
names(vc_cols) <- c(
'Frame_Shift_Del',
'Missense_Mutation',
'Nonsense_Mutation',
'Multi_Hit',
'Frame_Shift_Ins',
'In_Frame_Ins',
'Splice_Site',
'In_Frame_Del'
)
oncoplot(maf = maf, colors = vc_cols, top = 15)
# 使用 somaticInteractions 函数检测互斥突变或同时突变的基因
somaticInteractions(maf = maf, top = 25, pvalue = c(0.05, 0.1))
★ 上方的条形图展示的是样本的突变负荷(若有样本的突变负荷特别高,在后续分析时可考虑是否将其作为离群点去除),右侧的直方图展示了基因的每种突变类型的情况
★ 突变率靠前的基因中,同时出现突变的基因对偏多
# 绘制Oncostrip(展示特定基因在样本中的突变情况)
oncostrip(maf = maf, genes = c("PTGS2","NNMT","CA9"))
# 绘制Transitions Transversions汇总图
laml.titv = titv(maf = maf, plot = FALSE, useSyn = TRUE)
plotTiTv(res = laml.titv)
★ 转换和颠换图 (Transition and transversion plot, Ti/Tv) 显示了STAD中SNV的分布,分为 6 个转换和颠换事件。堆积条形图(底部)显示了MAF文件中每个样本的突变谱分布
2. 计算 TMB
stad.tmb <- tmb(maf, captureSize = 38, logScale = T)
dim(stad.tmb)
# [1] 431 6
head(stad.tmb)
# Tumor_Sample_Barcode total total_perMB total_perMB_log group Tumor_Sample_ID
# 1: TCGA-RD-A8N4-01A-21D-A364-08 1 0.02631579 -1.5797836 TMB_low TCGA-RD-A8N4
# 2: TCGA-HU-A4GJ-01A-11D-A25D-08 2 0.05263158 -1.2787536 TMB_low TCGA-HU-A4GJ
# 3: TCGA-RD-A8N0-01A-12D-A364-08 2 0.05263158 -1.2787536 TMB_low TCGA-RD-A8N0
# 4: TCGA-FP-8210-01A-11D-2340-08 3 0.07894737 -1.1026623 TMB_low TCGA-FP-8210
# 5: TCGA-VQ-A8PQ-01A-11D-A410-08 5 0.13157895 -0.8808136 TMB_low TCGA-VQ-A8PQ
# 6: TCGA-BR-A44T-01A-32D-A24D-08 7 0.18421053 -0.7346856 TMB_low TCGA-BR-A44T
# 根据TMB平均值进行分组
library(dplyr)
stad.tmb <- stad.tmb %>% mutate(group = if_else(total_perMB_log > mean(total_perMB_log), "TMB_high","TMB_low"))
3. 结合临床数据进行生存分析
# 只取前12位患者编号
stad.tmb$patient <- substr(stad.tmb$Tumor_Sample_Barcode, 1, 12)
# 加载自身临床数据
clin_info[1:4,1:4]
# patient entity_submitter_id status time
# 1 TCGA-BR-A4J4 TCGA-BR-A4J4-01A-12R-A251-31 0 16
# 2 TCGA-BR-A4IZ TCGA-BR-A4IZ-01A-32R-A251-31 1 273
# 3 TCGA-RD-A7C1 TCGA-RD-A7C1-01A-11R-A32D-31 1 507
# 4 TCGA-BR-6852 TCGA-BR-6852-01A-11R-1884-13 0 1367
TMB <- merge(clin, stad.tmb, by = "patient")
fit <- survfit(Surv(time, status) ~ group, data = TMB)
ggsurvplot(fit, data = TMB,
pval = T,
risk.table = T,
surv.median.line = "hv", #添加中位生存曲线
palette = c("red", "blue"), #更改线的颜色
legend.labs = c("High risk","Low risk"), #标签
legend.title = "TMB Score",
title = "Overall survival", #标题
ylab = "Cumulative survival (percentage)", xlab = " Time (Days)", #更改横纵坐标
censor.shape = 124, censor.size = 2, conf.int = FALSE, #删失点的形状和大小
break.x.by = 500 #横坐标间隔
)