老规矩,先奉上学习资料链接:
MutationalPatterns简介
突变过程在基因组DNA中会留下特征足迹(footprints)。该软件包提供了一套全面的灵活功能,使研究人员能够轻松评估和可视化碱基替代类别中的大量mutational patterns,例如健康样本,肿瘤样本或NDA修复缺陷细胞。
该包涵盖了广泛的模式,包括:mutational signatures, transcriptional and replicative strand bias, lesion segregation, genomic distribution以及association with genomic features。
该软件包适用于单核苷酸变异(SNVs),插入和删除(Indels),双碱基替换(DBSs)和更大的多碱基替换(MBSs)。
该包提供了提取mutational signatures de novo和在单个样本级别上确定先前识别的mutational signatures的贡献的功能。
此教程展示了这个包中一些常见函数的使用方式。
安装并加载:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("MutationalPatterns")
library(MutationalPatterns)
输入数据
此包的输入数据为VCF文件以及对应的参考基因组序列。
包中内置了一些参考基因可供选择:
library(BSgenome)
head(available.genomes())
ref_genome <- "BSgenome.Hsapiens.UCSC.hg19"
library(ref_genome, character.only = TRUE)
这个包中提供了两个示例数据集:
- 其中一个包含来自3个不同组织的9个正常人类成体干细胞的体细胞SNV catalogues(Blokzijl et al. 2016)
- 另一个包含来自3个健康人类造血干细胞的体细胞indels和DBSs(Osorio et al. 2018)
加载示例SNVs数据
vcf_files <- list.files(system.file("extdata", package = "MutationalPatterns"),
pattern = "sample.vcf", full.names = TRUE
)
sample_names <- c(
"colon1", "colon2", "colon3",
"intestine1", "intestine2", "intestine3",
"liver1", "liver2", "liver3"
)
# 生成GRangesList对象
grl <- read_vcfs_as_granges(vcf_files, sample_names, ref_genome)
# 添加样本信息
tissue <- c(rep("colon", 3), rep("intestine", 3), rep("liver", 3))
grl对象:
加载示例数据indels, DBSs and MBSs
blood_vcf_fnames <- list.files(
system.file("extdata", package = "MutationalPatterns"),
pattern = "blood.*vcf", full.names = TRUE)
# Set their sample names.
blood_sample_names <- c("blood1", "blood2", "blood3")
# 使用type="all"参数不过滤任何突变类型
blood_grl <- read_vcfs_as_granges(blood_vcf_fnames, blood_sample_names, ref_genome, type = "all")
# You can now retrieve different types of mutations from the GrangesList.
snv_grl <- get_mut_type(blood_grl, type = "snv")
indel_grl <- get_mut_type(blood_grl, type = "indel")
dbs_grl <- get_mut_type(blood_grl, type = "dbs")
mbs_grl <- get_mut_type(blood_grl, type = "mbs")
也可以在数据读取的时候直接选择一种mutations
indel_grl <- read_vcfs_as_granges(blood_vcf_fnames, blood_sample_names, ref_genome, type = "indel")
Mutation characteristics
mutation有四种类型:
- SNVs:Base substitution types,Mutation spectrum,96 mutational profile,Larger contexts
- Indels
- DBSs
- MBSs
1)SNVs:Base substitution types
mutations_from_vcf:先来看第一种SNVs的特征单碱基替换,使用mutations_from_vcf函数:格式为REF>ALT
## 1)SNVs:Base substitution types 单个样本
muts <- mutations_from_vcf(grl[[1]])
head(muts, 12)
[1] "A>C" "A>G" "C>T" "A>G" "G>T" "T>A" "T>C" "G>A" "G>A" "C>A" "G>A" "G>T"
mut_type:也可以将碱基替换模式转换成6 types of base substitution types:C>A, C>G, C>T, T>A, T>C, T>G
如G>T,G为reference allele,T为alternative allele,将G:C>T:A变成C>A
types <- mut_type(grl[[1]])
head(types, 12)
[1] "T>G" "T>C" "C>T" "T>C" "C>A" "T>A" "T>C" "C>T" "C>T" "C>A" "C>T" "C>A"
mut_context:从参考基因组中检索VCF对象中碱基替换的序列上下文(一个碱基上游和一个碱基下游)
context <- mut_context(grl[[1]], ref_genome)
head(context, 12)
chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr2 chr2 chr2
"CAG" "AAC" "ACA" "AAG" "TGA" "GTT" "ATT" "CGC" "AGC" "ACA" "CGT" "GGA"
type_context:可以检索VCF GRanges对象中所有位置的类型和上下文。
type_context <- type_context(grl[[1]], ref_genome)
lapply(type_context, head, 12)
$types
[1] "T>G" "T>C" "C>T" "T>C" "C>A" "T>A" "T>C" "C>T" "C>T" "C>A" "C>T" "C>A"
$context
chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr2 chr2 chr2
"CTG" "GTT" "ACA" "CTT" "TCA" "GTT" "ATT" "GCG" "GCT" "ACA" "ACG" "TCC"
mut_type_occurrences:可以在GRangesList中计算所有VCF对象的突变类型发生次数
type_occurrences <- mut_type_occurrences(grl, ref_genome)
type_occurrences
结果如下:
2)SNVs:Mutation spectrum
突变谱显示碱基替换类型中每种突变类型的相对贡献。
plot_spectrum函数可以绘制6种碱基替代类型在所有样本上的平均相对贡献。
误差条表示所有样本的95%置信区间。
图的标题显示突变的总数。
opt <- list(od = "youpath/")
## 2)SNVs:Mutation spectrum
p1 <- plot_spectrum(type_occurrences)
p2 <- plot_spectrum(type_occurrences, CT = TRUE)
p3 <- plot_spectrum(type_occurrences, CT = TRUE, indv_points = TRUE, legend = FALSE)
library(gridExtra)
p <- grid.arrange(p1, p2, p3, ncol = 3, widths = c(3, 3, 1.75))
ggsave(filename = paste0(opt$od, "/Mutation spectrum.png"), width = 12, height = 4, plot = p)
也可以进行分组绘制展示:
p4 <- plot_spectrum(type_occurrences, by = tissue, CT = TRUE, legend = TRUE)
p5 <- plot_spectrum(type_occurrences, CT = TRUE, legend = TRUE, error_bars = "stdev")
p <- grid.arrange(p4, p5, ncol = 2, widths = c(4, 2.3))
ggsave(filename = paste0(opt$od, "/Mutation_spectrum_group.png"), width = 10, height = 4, plot = p)
3)SNVs:96 mutational profile
首先生成一个96个三核苷酸突变计数矩阵
## 3)SNVs:96 mutational profile
mut_mat <- mut_matrix(vcf_list = grl, ref_genome = ref_genome)
head(mut_mat)
对96突变谱进行可视化,选择其中两个样本绘图看一下:
p <- plot_96_profile(mut_mat[, c(1, 7)])
可视化结果如下:
4)SNVs:Larger contexts
也可以看看看到更大的突变背景。但是,这只有在有大量突变时才有用
## 4)SNVs:Larger contexts
mut_mat_ext_context <- mut_matrix(grl, ref_genome, extension = 2)
head(mut_mat_ext_context)
p <- plot_profile_heatmap(mut_mat_ext_context, by = tissue)
ggsave(filename = paste0(opt$od, "/Larger_contexts.png"), width = 10, height = 4, plot = p)
Indels
首先,通过get_indel_context获得COSMIC indel上下文,get_indel_context函数将muttype和muttype_sub列添加到GRangesList中。muttype列包含indel的主要类型。muttype_sub列显示重复单元的数量。图中还会显示微同源性缺失(microhomology (mh) deletions)长度,并计算indels每类型的数量。
### Indels
indel_grl <- get_indel_context(indel_grl, ref_genome)
head(indel_grl[[1]], n = 3)
indel_counts <- count_indel_contexts(indel_grl)
blood1 blood2 blood3
C_deletion_1 37 10 2
C_deletion_2 14 4 1
C_deletion_3 4 2 2
C_deletion_4 1 1 0
C_deletion_5 1 0 0
C_deletion_6+ 1 0 0
绘制indel spectra谱
p <- plot_indel_contexts(indel_counts, condensed = TRUE)
ggsave(filename = paste0(opt$od, "/indel_spectra.png"), width = 12, height = 7, plot = p)
不展示indel细节
## 不展示细节
p <- plot_main_indel_contexts(indel_counts)
ggsave(filename = paste0(opt$od, "/indel_main_spectra.png"), width = 10, height = 7, plot = p)
DBSs
通过更改GRangesList的REF和ALT列获取COSMIC DBS上下文。计算每种类型的DBS数量。这同样会得到一个类似于mut_mat矩阵的矩阵,并进行绘图展示:
### DBSs
head(dbs_grl[[1]])
dbs_grl <- get_dbs_context(dbs_grl)
head(dbs_grl[[1]])
dbs_counts <- count_dbs_contexts(dbs_grl)
p <- plot_dbs_contexts(dbs_counts, same_y = TRUE)
ggsave(filename = paste0(opt$od, "/DBSs_spectra.png"), width = 10, height = 7, plot = p)
也可以选择只基于参考碱基来绘图,现在x轴包含参考碱基
p <- plot_main_dbs_contexts(dbs_counts, same_y = TRUE)
ggsave(filename = paste0(opt$od, "/DBSs_main_spectra.png"), width = 10, height = 7, plot = p)
MBSs
这个示例数据中没有COSMIC MBS,就看看代码吧
### MBSs
mbs_counts <- count_mbs_contexts(mbs_grl)
p <- plot_mbs_contexts(mbs_counts, same_y = TRUE)
ggsave(filename = paste0(opt$od, "/MBSs_spectra.png"), width = 10, height = 7, plot = p)
示意图如下:
样本整合
有时每个样本只有很少的突变。在这些情况下,组合多个样本可能是有用的。这可以通过pool_mut_mat来实现。这适用于snv、索引、DBSs和mbs的矩阵。
pooled_mut_mat <- pool_mut_mat(mut_mat, grouping = tissue)
head(pooled_mut_mat)
下次学习Mutational signatures~