wes文献抄读3
Title:Whole-exome sequencing reveals the origin and evolution of hepato-cholangiocarcinoma refractory metastaticbreast cancer
Journal:Nat Commun
IF:12.353
Personal Comment:
词组:H-ChC 肝细胞胆管癌
intratumor heterogeneity (ITH):肿瘤内异质性
文章内定义:significantly mutation genes (SMGs)
文章切入点:类型少见但分子机制不清楚,一方面,胆管细胞癌存在自有突变,说明肿瘤内异质性,另一方面,HCC和iCCA共有的许多普遍存在的突变表明H-ChC的单克隆起源。
VCAN,ACVR2A和FCGBP,有助于区分HCC和iCCA。
免疫组织化学证明EpCAM在80%的H-ChC中高度表达,这意味着这种肝癌的干性,我们的数据突出了H-ChC的单克隆起源和干性,以及实质性的肿瘤内异质性。
现象的发现——研究进展——提出科学问题
科学问题:the cellular origins of HCC and iCCA in H-ChC and the underlying mechanisms remain largely unknown.
unequivocal :明确的
ubiquitous:广泛存在的
stemness:干细胞特性,干性
HBV integrations:HBV整合
predispose:易患,易感
overdiploidy:超过二倍体
phylogenetic analysis:系统发育分析
WES:7个H-ChC患者,共21个样本做wes测序
平均测序深度:93.01× - 215.6×
体细胞核苷酸变体(SNV),插入缺失,体细胞拷贝数变异(CNV)和HBV整合,初步评估H-ChC样品中突变的分布。
统计:每个H-ChC患者携带有159个非同义SNVs,6个移码indel,136个CNVs,2.6携带有HBV整合。
nonnegative matrix factorization approach 非负矩阵分解法,可以用NMF R packages实现
在所有样本中鉴定出359个显着突变基因(SMG)
作者构建了系统发育树进一步探索H-ChC的克隆结构
九个突变基因,如FCGBP和VCAN,在7名H-ChC患者中均有发现,与维持干性有关。在Wnt和Notch中有12个突变基因途径包括TP53和NFATC2 / 3,它们调节肝细胞和胆管上皮的分化,见于五名H-ChC患者。这些数据表明这些关键基因的参与在H-ChC中分化HCC和iCCA的途径。为了进一步证实H-ChC的干性,对75名患者的FFPE组织做干细胞标记性基因的免疫组化,包括c-kit 和 EpCAM 。
考虑EpCAM的蛋白表达水平而不考虑癌症类型进行生存分析,单因素生存分析用log-rank检验,显示EpCAM阳性与预后不良显著相关,但是多因素cox回归分析中未见显著性。
讨论:
我们使用外显子组数据进行CNV分析,并且证明这种方法与使用低深度全基因组CNV分析的结果相比更加准确。
Private somatic CNVs occurred more frequently than private somatic mutations.
These findings support the notion that substantial intratumor heterogeneity exists within H-ChC.
Intratumor heterogeneity in H-ChC poses a great challenge for liver cancer-targeted therapy.
肿瘤内部一致性的突变可能是H-ChC更好的选择,因为它们起到更重要的致癌作用。因此,有必要进行微观下多区域选择组织以便了解H-ChC的完整遗传背景,可以有助于选择靶向分子疗法。
For each H-ChC sample, the probability of monoclonal and bicolonal is equal, just as the two sides of coin.The probability of same clonal features for different samples decreases 50% with one sample increase, especially all samples
were chosen randomly. Therefore, although seven H-ChC samples are small, the conclusion is also reliable.
这段话非常重要,7个样本的样本量很小,但是作者用概率论证明了结论是可靠的。
Indeed, it is not rigorous to identify HBV integration using WES on FFPE samples.we just regard the HBV integration as a supplementary evidence to explore the clonality of H-ChC.
如实提出问题,并说明原因。
作者的假说:We hypothesize that some key genes or pathways may contribute to the loss of the undifferentiated state of stem cells and/or hepatocyte or biliary epithelial differentiation.
we demonstrated that some gene mutations might lead to the instability of stem state, whereas others contributedt o distinct differentiation of H-ChC.
However, whether there are any corresponding changes in the protein levels of such mutation genes remain unknown, which could be exploited using next-generation RNA sequencing.----转录组的应用
Nevertheless, we are still not clear with regard to the origin of H-ChC, as stem cell-like features could derive either directly from stem cells or through reprogramming from mature differentiated liver cancer cells24.
提出文章还没有解决的问题。
However, when using multivariate Cox analysis,the prognostic value of EpCAM expression was not statistically significant, possibly due to the limited sample size in our study.
在多因素cox回归分析中没有出现显著性,在讨论中作者分析了原因。
提出了新的分级推荐----文章的转化实际意义
In conclusion, exome sequencing analyses are suggestive of the monoclonal origin of H-ChC, which may promote the molecule classification of primary liver cancer on the basis of cell origin. In addition, the substantial intratumor heterogeneity noted in HChC urges multiregional sequencing analysis to find the common driver mutations that playing more important role in carcinogenesis, thus make target drugs selection more precision and effective.
重点分析:Exome sequencing data analysis for SNVs and INDELs calling
raw data:FastQ 格式
测序平台:Hiseq4000
丢弃: 有adapter无人的read
有不确定核苷酸超过10%
双端reads当有一端超过50%为低质量时。
经过上述处理,得到高质量的clean data
QC质控:QC statistics including total reads number, sequencing error rate, percentage of reads with average quality >Q20, percentage of reads with average quality >Q30, and GC content distribution can be calculated.
比对:
双端清洗reads比对到参考基因组UCSC hg19
工具:using Burrows–Wheeler Aligner (BWA) software
如果读取或读取对映射到多个位置,BWA将选择最有可能的展示位置。而如果两个或更多的可能性最大同时存在,BWA将随机选择任何一个。
GATK去除扩增时产生的重复pcr,该步骤由Picard工具处理。
然后,通过Indelrealigner and RealignerTargetCreator对indels调整,得到一个新的bam的文件,其中包含所有原始读取,但在重新对齐的区域中具有更好的局部对齐。
然后,碱基质量评分进行重校准以避免系统误差。---GATK
GATK HaplotypeCaller和variantFiltration鉴别过滤突变(SNP,INDELs)
比对数据库:dbSNP35,1000 Genomes database
注释:ANNOVAR
MuTech38 和 Strelka:鉴别 SNVs 和 somatic INDELs匹配临近的非肿瘤样本
突变谱分析:
1.绘制箱线图
2.聚类分析及热图
3.用非负矩阵分解方法做96个非同义突变的聚类分析,获得3个不同的突变特征
4.突变特征数据库COSMIC
5.应用Pyclone做系统发育分析
6.应用in-house software将样本突变与已知突变进行比对分析
涉及4个突变数据库:Cancer Gene Census(CGC513), Bert Vogelstein125, SMG127, and Comprehensive435 database
7.PathScan software软件做富集分析
展示数据库:KEGG54, Biocarta, PID, and Teactome
8.利用外显子测序数据对拷贝数变异进行分析
control-FREEC:Control-FREEC could construct
copy number profiles with aligned BAM data.Then, the profiles were normalized,segmented, and analyzed to obtain the copy number state of each genomic region.We compared cancer tissues with matched non-cancerous tissues with control-FREEC
读到这里,我深深的感慨,后期需要大量补脑啊!!!
1.Aspera从EBI的数据下载
2.WES下游分析
3.构建进化树
4.绘制circos图
5.对SNP,Variant数据库了解
6.多种软件使用:control-FREEC
7.算法了解:非负矩阵分解方法