10X单细胞（10X空间转录组）TCR转录组联合数据分析之TCRdist3(6)neighbor graph analysis（CoNGA）

hello，大家好,今天我们继续，我们分享一个重要的分析方法，特别重要，转录组和TCR的联合分析，文献在Linking T cell receptor sequence to transcriptional profiles with clonotype neighbor graph analysis (CoNGA)，影响因子54分（nature biotechnology）.这一篇是原版文献。

方法特别重要，我们这一次分享文献，以后分享示例代码

文献的部分我摘录重点给大家

Abstract

1、Multi-modal single-cell technologies capable of simultaneously assaying gene expression and surface phenotype across large numbers of immune cells have described extensive heterogeneity within these complex populations, in healthy and diseased states.（现在的单细胞多组学内容很多了，最近最为关注的就是单核 + ATAC，当然 5’ + TCR也是一个非常重要的方向,当然，如果条件允许，还可以加上单细胞蛋白组学）。

2、这些丰富的高维数据集有可能揭示 TCR 序列和 T 细胞表型之间的新关系，这些关系超越了克隆相关细胞共享特征的识别。当然，我们在识别TCR和转录组关系的同时也是非常困难的，做过相应分析的同学应该都知道。

Introduction

1、Previous work pairing gene expression and TCR sequence has largely focused on the TCR sequence as a unique 'barcode' by which to identify clonally related cells.（对肿瘤疾病方面的研究当然非常重要）。

2、From these works we see that T cell clones derived from a common clonal ancestor tend to display a similar transcriptional profile.（这一点我也是第一次见到这样的说法，不过，应该是对的）。

3、然而,TCR 序列相似性与细胞表型之间的关系尚未使用现有的大型单细胞数据集进行系统探索。(单独分析的现在居多)。

4、but approaches that can identify completely new populations or subpopulations by correlating GEX and TCR sequence have not been reported. Also lacking are methods for identifying correlations between TCR sequence and GEX that do not extend to global similarity or associate with a defined cell population（这种分析的文献确实少见，我几乎没印象）。

5、In parallel to the developments in single-cell profiling, methods for quantifying TCR repertoire features and identifying patterns within them have matured, helping extend our understanding of T cell biology.（单独分析当然很成熟了）.这里就要提到之前说到的方法，TCRdist，计算TCR序列之间的相似性。

关键点6、it is clear that T cells targeting the same pathogen-derived epitope utilize T cell receptors that share consistent, definable amino acid motifs.（这个毋庸置疑，不然怎么起作用）。

7、In addition to these conventional T cell responses, it is well known that certain unconventional T cell populations,such as mucosal-associated invariant T (MAIT) cells and invariant natural killer T (iNKT) cells, are characterized by conserved TCR sequence features and GEX profiles（这里隐含的意思应该就是相同的TCR序列拥有相似的基因表达）。

8、已经描述了许多不同的 T 细胞亚群的subsets，它们具有适合其富集的标记，但是，很可能其他由 TCR 和 GEX 连接的亚群仍未被发现。

9、hypothesized that by identifying correlations between “TCR neighborhoods”, defined by shared sequence features, and gene expression, we could overcome the strict limitation of examining these correlations within individual clonal families and potentially identify novel associations between T cell antigen-specificities and phenotypes.（这也是我作为生信人员一直想要的答案）.

10、CoNGA，通过分析定义在一组 T 细胞克隆型上的相似性图来识别 GEX 谱和 TCR 序列特征之间的相关性，并将其应用于公开可用的 T 细胞数据集的集合，以无偏见地搜索通过协变关联的 T 细胞群他们的repertoire特点和 GEX 资料。

11、T cell populations specific for individual pMHC epitopes showed distinct gene expression profiles, with EBV epitope-specific T cell populations appearing to cluster according to the stage (latent vs early) of the antigen from which the peptide epitope was derived.（看来TCR和转录组之间有极强的相关性）。

Result

CoNGA algorithm(列举重点)

CoNGA was developed to identify correlations between gene expression profile and TCR sequence in diverse T cell populations without prior knowledge of the precise nature of these correlations（基因表达信息和TCR信息的的关联分析）。

envisioned two broad categories of correlation:

1、one based on similarity, in which cells similar with respect to GEX are also similar with respect to TCR sequence, and one based on features, in which specific aspects of GEX and of TCR sequence are correlated, without global similarity of both properties.（翻译一下：一种基于相似性，其中与 GEX 相似的细胞在 TCR 序列方面也相似，另一种基于特征，其中 GEX 和 TCR 序列的特定方面是相关的，但没有两个属性的全局相似性。个人倾向于第二种）.

2、CoNGA $graph-vs-graph$ correlation was developed to detect the first category of correlation, using the mathematical concept of graph neighborhoods to formalize our intuitive notion of global similarity。

3、在没有相关特征的先验知识的情况下，从头发现基于特征的相关性更具挑战性，因为它需要枚举和测试所有可能的特征对。（当然，也最可靠）。

4、CoNGA $graph-vs-feature$ analysis represents a compromise approach(折中方法) in which we assume that, at least on one side of the correlation, some degree of global similarity is present。

5、we then enumerate possible features defined by the other property, and test for graph neighborhoods with biased feature distributions.（如何寻找特征是重点）。In practice, we find substantial overlap between the results of these two approaches, as, for example, when the identified features in graph-vs-feature correlations are marker genes for a subpopulation of cells that also share detectable global similarity of gene expression.（发现这两种方法的结果之间存在大量重叠，例如，当图与特征相关性中识别的特征是细胞亚群的标记基因时，这些细胞亚群也共享可检测的基因表达全局相似性。）。

6、However, we also see cases in which graph-vs-feature analysis reveals a correlation, `for example between expression of a specific gene and usage of a particular V gene segment`, that is not characterized by global similarity with respect to both gene expression and TCR sequence。(这个相关性，相信很难被发现)。

7、These two approaches are also quite complementary: retrospective analysis（回顾分析） of $graph-vs-graph$ correlations can, as in the case of the putative MHC-independent population described below, suggest specific gene expression or TCR sequence features that can then be input to graph-vs-feature analysis for sensitive detection of specific correlations（相关性还是很强的）。

（定义）CoNGA similarity graphs are defined at the level of clonotypes rather than individual cells.T cells of the same clonotype, which by definition have the same TCR sequence, tend to have similar GEX profiles。

图片.png

Thus, similarity graphs based on gene expression drawn at the level of individual cells will contain many edges connecting cells within the same clonal family.（相互证明的关系）。

8、In the TCR similarity graph, each node (clonotype) is connected by edges to its K nearest-neighbor (KNN) nodes based on TCR similarity as assessed by the TCRdist measure, which scores sequence similarity in the pMHC-contacting CDR loops of the TCR alpha and beta chains (here K is an adjustable parameter specified as a fraction of the total number of clonotypes).（看起来套用了单细胞转录组数据找邻居的方法，但是这里用到的方法是TCRdist，前面的文章已经介绍过）。

9、In the gene expression (GEX) similarity graph, each clonotype is connected by edges to its KNN clonotypes based on similarity in GEX profile（这个后面的方法详细讲如何做的）. Expanded clones are represented by the GEX profile of a single representative cell, the one with the smallest average distance to the rest of the clonal family.

10、In graph-vs-graph correlation analysis（图片的内容需要好好分析）,CoNGA identifies statistically significant overlap between the GEX similarity graph and the TCR similarity graph.

图片.png

图注：T cell clonotype neighbor graph analysis (CoNGA). (a) In graph-vs-graph analysis, CoNGA identifies correlation between T cell gene expression (GEX) and TCR sequence by constructing a gene expression similarity graph and a TCR sequence similarity graph and looking for statistically significant overlap between them. Overlap is assessed on a per-clonotype basis by counting the number of edges that originate at each clonotype and are shared between the two graphs, or equivalently by measuring the overlap between each clonotype’s GEX graph neighbors and its TCR graph neighbors, and assigning a score that reflects the likelihood of seeing equal or greater overlap by chance. (b) A single clonotype and its GEX and TCR neighbors are shown in the GEX (left panel) and TCR (right panel) 2D UMAP projections for the 10x 200k donor2a dataset. The clonotype is marked with a black ’x’, its GEX neighbors are shown as blue points, its TCR neighbors as green points, and the clonotypes that are both GEX and TCR neighbors are shown in red. The significance of the observed overlap—8 clones shared between two neighbor sets of size 24 in a total population of 2427 clonotypes—is calculated using the hypergeometric distribution, giving a P value of 1.7×10−11。

11.方法：We consider each node (clonotype) in turn, count the overlap between its neighbors in the two graphs，and assign a significance score that contrasts this observed overlap to that expected under a simple null model: the CoNGA score for this clonotype, equal to the hypergeometric probability of seeing the observed overlap by chance, multiplied by the total number of clonotypes, to adjust for multiple testing.（看来需要点算法内容了）。

12、分数的界定，CoNGA scores range from 0 to the number of clonotypes; scores close to 0 are significant, scores around 1 are borderline, and scores above 1 are expected to occur by chance。`This mode of analysis identifies T cell clonotypes whose neighbors in gene expression space overlap significantly with their neighbors in TCR sequence space.`（其实这种联合分析的方法，应该更有价值，但是我们之前一般只关注转录组或者TCR，忽略了这部分联合的信息）。

13、model（塑造） the concept of a clonotype's neighbors in GEX or TCR space using the mathematical concept of a $graph$ $neighborhood$ , defined as all the vertices directly connected to one central vertex

图片.png

图注：(d) The gene KLRB1 (CD161) shows a non-uniform distribution over the TCR sequence landscape—discrete regions of higher expression (red) against a background of lower expression (blue)—suggesting correlation between gene expression and TCR sequence. This is quantified for a single clonotype (green outline) and its TCR sequence neighbors (black outlines) in the inset violin plot, which shows the KLRB1 expression level for the clonotype and its neighbors on the right and for the remainder of the dataset on the left. The Mann-Whitney-Wilcoxon P value for this expression difference is 1.5×10−46。

14、CoNGA's second mode of analysis, graph-vs-feature analysis, was developed to detect GEX/TCR correlation that involves specific gene expression or TCR features rather than overall similarity.（这个分析可能价值更大一点）。This mode of analysis can identify TCR sequence neighborhoods with differentially expressed genes（这才对）,for example, or gene expression neighborhoods with distinctive CDR3 sequence features (length, hydrophobicity, charge, etc). （果然有了一些基础之后读起来轻松多了，刚开始直接读这一篇很多不知道在说什么）。

15、In graph-vs-feature correlation analysis,

图片.png

图注c：In graph-vs-feature analysis, a numerical feature defined by one property (here gene expression) is mapped onto a similarity graph defined by the other property (TCR sequence), and graph neighborhoods with skewed score distributions are identified using statistical tests that compare the scores for each neighborhood (including the central vertex) with the scores of the remaining clonotypes。

CoNGA maps numerical features derived from one property (gene expression or TCR sequence) onto the similarity graph defined by the other property and looks for neighborhoods in the graph with unexpectedly high or low feature distributions(当然，展示结果也很好)。

接下来具体情况进行分析（讲道理，理解起来有点难啊）

CoNGA graph-vs-graph analysis identifies correlation between gene expression and TCR sequence（首先是公共数据，成对的T细胞转录组和TCR）。

图片.png

上图illustrates the CoNGA graph-vs-graph analysis workflow for two datasets of human peripheral blood T cells, one a mix of CD4+ and CD8+ cells (vdj_v1_hs_pbmc, Fig. 2a-c) and one containing flow-sorted CD8+ T cells (10x_200k_donor2a, Fig. 2d-f）。看看详细分析步骤。

第一步：the UMAP algorithm is applied to the gene expression and TCRdist （TCR序列相似度构成的矩阵）matrices of each dataset to generate two dimensional projections of the GEX and TCR landscapes。（相当于降维）。

第二步：a graph-based clustering algorithm is applied to the GEX matrix to partition the dataset into clusters of clonotypes with similar transcriptional profiles and to the TCR distance matrix to produce clusters of clonotypes with similar TCR sequences。（相当于聚类）。The GEX and TCR landscape projections are colored by CoNGA score to visualize the relative location of the topscoring CoNGA hits in these landscapes。（分数的定义前面说了，大家注意分数的特征属性）

第三步：the GEX and TCR cluster assignments of CoNGA hits with scores below a threshold (here 1.0) are shown in the 2D projections using bicolored disks whose left (right) half corresponds to the GEX (TCR) cluster assignment。（分布CoNFA分数）。

These plots reveal that both datasets contain a substantial number of clonotypes with significant CoNGA scores, and that these CoNGA hits are located in specific regions of the GEX and TCR landscapes.（跟之前的预期一致）。

第四步：To gain insight into these groups of related clonotypes, we leverage the fact that each dataset has been clustered for both GEX and TCR sequence similarity, independently, and thus each clonotype maps to a pair of clusters (a GEX cluster and a TCR sequence cluster).（映射，cluster之间进行配对）.These cluster pairs provide useful handles by which to identify CoNGA hits because they contain information on GEX and TCR, allowing us to map between the two landscapes (which would require a four-dimensional plot for direct visual correspondence).

For example, in Figure 2a at the top of the GEX landscape we can see a cluster of CoNGA hits which all belong to GEX cluster 2 (light green on the left half of the disk) and TCR cluster 3 (red on the right half of the disk), or equivalently, cluster pair (2,3);(这样的配对，文献中还是第一次见)。we can infer that these correspond to the group of clonotypes in the TCR landscape also located near the top of the plot, that they are likely CD8+ (from the thumbnail in Fig. 2b), and largely TRAV14 (from the TCR cluster identifier in Fig. 2a).（感觉很智慧，这样的配对方法）。

每个包含任意最小数量的 CoNGA hits（此处为 5）的集群对的特征在于一行序列标志样式的可视化（图 2c/f），这些可视化标识了这些 CoNGA hits的显着特征，包括最重要的 DEG、TCR 基因片段使用、CDR3 motif和 GEX 标志，突出显示了定义典型 T 细胞亚群（CD4、CD8 等）的几个标志性基因。这些以一致的格式排列，可以扫描以快速评估集群在主要细胞子集中的位置。

当然，接下来还有一些其他用 $graph-vs-graph$ 分析的案例，找到了一些原本没有发现的问题。主要这样寻找到的pairs，既可以表征基因的marker（当然，主要指显著的差异基因），也可以表征关联的TCR群的motif序列，两者关联起来，非常nice，从功能和抗原表位识别两个角度诠释生物学问题，我们来看一看这些案例。

案例1 CoNGA defines a HOBIT+/HELIOS+ T cell population shared

across multiple donors

1、当然，数据用到的是用pMHC multimers抗原表位富集后的CD8+ T 细胞类群，实验也证明了有明显的表位特异性反应。当然，也发现了一些不特异的结合。for example to MAIT cells, or to cells that were very likely part of epitope-specific responses to other epitopes.说明收集到的T细胞并不是所有的TCR都特异结合提供的抗原表位。CoNGA detected a large number of significant GEX/TCR correlations across these datasets, identifying 62 cluster pairs of size at least 5 and 42 using the more stringent size threshold of 0.1% of the dataset. （看来分析的结果存在很多的可能。）

图片.png

进一步分析可以对数据分成三组，（如上图所示），(1) Flu M158-responding clones; (2) MAIT cells; (3) a population of clonotypes with a shared expression profile (high expression of genes including the transcription factors ZNF683 (aka HOBIT) and IKZF2 (aka HELIOS), along with DUSP1/2, CD7, CD99, and KLRD1), diverse TCR gene usage, and rather long CDR3 regions. 其中第三个类群是我们关注的重点（这个分析的组合分类的目的其实也是在寻找pairs对生物学识别抗原表位及基因表达变化的生物学意义）。

为了进一步剖析第三个类群（HOBIT-expressing clonotypes）,对该类群和背景的TCR序列进行了比较，As expected from examination of the TCR sequence logos in Figure 3, the CDR3α and CDR3β loops are significantly longer in the HOBIT+ CoNGA population than in background（长度有变化，这也是关注的一个点）；CDR3 也 (1) 带更多正电荷 (P<10-40)； (2) 芳香族残基含量较高，尤其是色氨酸（P<10-60），一般为疏水性和体积较大的氨基酸 ;(3) 半胱氨酸含量更高（>100 倍富含 CDR3β，P<10-50)。（说明对TCR的分析最终还是归结于蛋白序列的分析）。这些序列特征与 MHC 基因敲除小鼠 TCR 库的实验研究中 MHC 非依赖性 TCR 序列与 MHC 限制性 TCR 序列的比较中发现的特征极为相似。

特异结合MHC的TCR的半胱氨酸被认为反映了二硫键的形成（和其他的半胱氨酸）由胸腺中的负选择施加的 MHC 呈递肽。位于 CDR3 区域顶点内的疏水残基对于介导与胸腺中自身肽 MHC 的相互作用很重要，基于这些趋势，我们假设这个 CoNGA 鉴定的群体代表了一个非规范的、自身特异性或 MHC 独立的 T 细胞群体。为了便于分析，开发了一个数字评分，即 iMHC 评分（用于“独立于 pMHC”），它捕获了它们定义的 CDR3 序列特征。（这个独立于其实就是特异性结合的意思,这里其实就是特异性结合MHC的分数，越高越好）。

我们接下来试图根据从它们的 DEG 中识别出的推定细胞表面标记来确定外周血 T 细胞中 HOBIT+ 群体的频率。其中第一个样本（ $10x_200k_donor1$ ）suggested that they were likely CD45RA⁺ CD45RO^dim based on TotalSeq labeling, negative for CCR7 expression, and positive for KLRC2, KLRC3, and a number of KIR genes.那个在这个病人中该类群的特征就是CD45RA⁺ CD45RO^dim/- CCR7^- KLRC2⁺ KLRC3⁺ KIR^+/-,值得注意的是，在描述 HOBIT 单克隆抗体生成的报告中（这是临床的报告），发现其在 CD45RA⁺ CCR7^-CD8 T 细胞中的表达最高，用这些细胞表面标志物对来自健康献血者的 PBMC 样本进行流式细胞术分析，证实存在表达 KLRC2 和 KIR2D（即 KLRC2⁺KIR2D^-、KLRC2⁺KIR2D⁺ 和 KLRC2^-KIR2D⁺) 。当然，KLRC2必定表达，KIR2D的表达却存在随机性。然而，KLRC2-KIR2D+ 表型与这些标准不一致，可能代表一个不同（但相当大）的 CD8 子集。

图片.png

As a percentage of total PBMC CD8 T cells, the KLRC2+ KIR2D+/- subset is in the range of 0.2-10.1% while KLRC2- KIR2D+ cells ranged between 0.3-7.6%

图片.png

接下来，我们对 KLRC2+ KIR2D+/- 和 KLRC2-KIR2D+ CD8 T 细胞进行了分类，并使用 qRT-PCR 测量了这些群体中 ZNF683、KLRC2 和 KLRC3 相对于每个供体自己分类的 CD8+CD45RA-CD45RO+ memory subset 的表达。 Here, we found expression of KLRC2 and KLRC3 was enriched in the KLRC2+ KIR2D+/- CD8 T cells, and to a lesser extent in the KLRC2- KIR2D+ subset 。

图片.png

However, ZNF683 appeared to be enriched only within the KLRC2+ KIR2D+/- subset, supporting their identity as the putative HOBIT+ population and further suggesting KLRC2- KIR2D+ T cells are in fact a separate, distinct subset。

综上所述，这些数据证实了外周血中表达 ZNF683 的 CD8+ CD45RA+ CD45ROdim/- CCR7-KLRC2+ KIR2D+/- T 细胞的存在与 HOBIT+ 群体一致，并且该亚组虽然因个体而异，但占 CD8 的很大一部分 T 细胞（在某些个体中高达 10%）。（发现了新的亚群）。

案例2 CoNGA identifies GEX/TCR correlation in thymic T cells（对胸腺T细胞的分析，数据包含了不同时间段的人群，totaling over 9400 clonotypes with paired alpha and beta TCR sequences.）。

CoNGA identified a large number of significant hits in this rich and complex dataset, primarily within the DP (double-positive), CD8 single positive (SP), CD4 SP, Treg, and CD8αα+ thymic populations

图片.png

In TCR sequence space, we see a concentration of hits in the TRAV41 cluster (this TRAV gene is enriched in DP cells),the TRAV1 and TRAV12 clusters (enriched in CD8 cells), and in the TRAV14 cluster (enriched in CD8αα cells)由 CoNGA 鉴定的 CD8+ 簇对还显示出高 CD8 序列分数和高分数（“alphadist”），该分数反映了纳入克隆型 TCR α 链的 TRAV 和 TRAJ 基因片段之间的基因组距离。 DP cluster对显示出较低的 alphadist 分数，优先选择基因座 3' 末端的 TRAV41 和其他 TRAV 基因，较长的 CDR3 环（CDR3 长度已显示在胸腺选择期间减少），以及边缘、表面的较高分数和无序的氨基酸特性，这可能表明 CDR3 区域极性更大、体积更小、相互作用力更弱。 however, CoNGA further identified high iMHC scores and longer CDR3 loops as TCR features of these clusters. Interestingly, the CD8αα(II) cluster pair expressed both ZNF683 and IKZF2, which together with TCR features similar to those of the HOBIT+ T cells in the blood identified above, suggests a possible precursor relationship between these two populations that warrants further investigation.（确实值得进一步研究）。

接下来第二部分，CoNGA $graph-vs-feature$ analysis confirms sharing of the HOBIT+/HELIOS+ T cell subset across donors(需要转换思路了)。

案例1

我们已经看到，CoNGA graph-vs-graph分析可以识别基因表达和 TCR 序列之间的各种相关性，从不变的 MAIT 和 iNKT 谱系，到表位特异性反应中的序列基序和表达偏向，再到较弱的 CDR3 表征 HOBIT+ 群体的序列偏好和差异表达基因。（which would likely be difficult to identify from analysis of TCR sequence or gene expression alone）。因此，仅涉及少数基因或非常特定的 TCR 序列特征的相关性，或者我们global GEX 和 TCR 距离测量未能很好地捕捉到的相关性，可能无法检测到。这里我们就要用到 $graph-vs-feature$ 的分析策略。

To be detected, these correlations must be characterized by some degree of elevated global similarity in both transcriptional profile and TCR sequence within the relevant cell population（相关性存在层级）.Thus, correlations that involve only a few genes or very specific TCR sequence features, or ones that are not well captured by our global GEX and TCR distance measures, may go undetected.（分析更加的细化了）。

2、CoNGA graph-vs-feature analysis was developed as a complementary graph based approach that could detect GEX/TCR correlations that are not characterized by global similarity of both properties（这个才是重点）。

3、In graph-vs-feature analysis, numerical features calculated on the basis of one cellular property, GEX or TCR sequence, are mapped onto a similarity graph defined by the other property, and the feature score distributions for each of the neighborhoods in the graph are compared to the background distributions to identify neighborhoods with skewed scores (here a graph neighborhood consists of a single central vertex together with all of its directly connected neighbors).（全局转向局部）。

图片.png

4、As GEX features, we consider the expression levels of individual genes, and for TCR sequence features, we use a set of CDR3 amino acid property values as well as a handful of additional, sequence-based scores（像这样说点大白话不行么 😄）。

5、We used graph-vs-feature analysis to identify additional members of the HOBIT+/HELIOS+ unconventional T cell subset by looking for GEX graph neighborhoods with elevated iMHC scores. Although the per-clonotype iMHC score is highly variable，by computing averages over GEX graph neighborhoods we can identify a subregion of GEX space with enhanced scores，whose significance can be assessed using standard statistical tests。

图片.png

图注 fig6.

Three of the four 10x_200k donors show populations of clonotypes with significantly enhanced iMHC scores (Fig. 6c-f) whose DEGs correlate well with one another and with the key marker genes (ZNF683, CD7, CD99, DUSP1/2) for the original HOBIT+ CoNGA clusters，其中异常的那个群年龄比较大，年龄越大，natural T cell的数目会下降，HOBIT+ CoNGA 克隆型的 iMHC 评分分布与具有已知 MHC 限制性的 TCR 的 iMHC 评分分布的比较表明可能与其他 MHC 依赖性 T 细胞亚群有亲和力。

图片.png

图注：Single-chain iMHC score distributions for TCR subsets. Score distributions for CDR3α repertoires are shown on the left and for CDR3β repertoires on the right. Single-chain variants of the iMHC score were fit with L1-regularized logistic regression just as for the paired iMHC score.

案例2，Graph-vs-feature analysis reveals differential gene expression

across the TCR landscape

（转换策略）We applied graph-vs-feature analysis in the reverse direction to identify genes that are differentially expressed in specific TCR graph neighborhoods.（the top significant gene for each cluster pair and a maximum of 10 genes per dataset ）。

下图 illustrates four graph-vs-feature correlations, showing visually how specific TCR-based and GEX-based features correlate across the 2D clonotype landscapes.

图片.png

Our TCR graph-based differential expression analysis identified several associations with the EPHB6 gene (and its murine homolog Ephb6), which codes for the Ephrin-B receptor Type 6 protein EPHB6。A recurring feature of these associations is the usage of the TRBV30 gene segment (TRBV31 in mouse).A focused search for covariation between TCR gene segment usage and gene expression using differential expression analysis confirmed a strong tendency for higher EPHB6 expression in clonotypes that incorporate the TRBV30 gene segment（看来确实具有很强的相关性）。

The TRBV30 segment is unique among TRBV genes in being located downstream of the TRBJ and TRBC genes at the end of the TCR beta locus; incorporation of TRBV30 into the TCR by V(D)J recombination requires an altered joining process in which intervening DNA sequence is inverted rather than being deleted . Providing a potential clue into the mechanism underlying this covariation, EPHB6 is located adjacent to TRBV30 on Chromosome 7, ~40kb downstream from the TCR beta locus。

图片.png

The strong correlation between TRBV30 usage and EPHB6 expression may indicate that expression of a TRBV30-containing TCR transcript also boosts expression of the EPHB6 gene (the mouse TRBV31 gene segment is located at an analogous location to that of TRBV30 in the mouse TR locus, and is also directly adjacent to the mouse homolog Ephb6).（看来基因之间的表达，具有协同性的原因可能是染色体的位置相近）。

这些发现表明 TCR 基因座边缘的 TCR 基因的使用与该基因座侧翼非 TCR 基因的表达之间存在相互作用（最为重要的一点）。

案例3，Neighbor-graph analysis of TCR:pMHC binding highlights GEX

similarity among T cells that recognize the same epitope

对于每个 pMHC，我们查看在该 pMHC 阳性的克隆型集合中是否存在比我们偶然预期更多的 GEX（或 TCR）similarity edges，并通过计算fold-enrichment以及近似 P 值

图片.png

From this analysis we can see, as expected, that nearly all the pMHC-positive clonotype subsets show greater than expected TCR sequence similarity。Indeed, the only pMHCs with a negative TCR neighbor-enrichment score are A03_KLG, which appears to show high levels of non-specific binding。pMHCs with large numbers of analyzed clonotypes show highly significant TCR similarity as assessed by the TCR-pMHC graph overlap，当然，pMHC-positive populations show greater than expected GEX similarity, with highly significant P-values and large fold-enrichments for most pMHCs with a sufficient number of analyzed clones。这些结果表明，相同 pMHC 阳性的克隆型具有比偶然预期的更相似的基因表达谱。

图片.png

最后，作者说了一些该软件的缺点。

1、a consequence of operating at the level of clonotypes rather than individual cells is that we miss out on variation within the cells of expanded clones.

2、Although we found that gene expression was largely consistent within clonally related cells, it may be worth exploring approaches in which cellular resolution is preserved, for example by defining graphs at the level of individual cells and masking out intra-clonotype neighbor edges to eliminate the strong signal of clonal GEX/TCR correlation（这确实是一个问题）。

3、results of applying CoNGA will depend critically on the distance measures

used to define clonotype similarity and construct the neighbor graphs。很多方法可以选择。

4、Another limitation is that, in our experience, successful application of

CoNGA requires a relatively large number of unique clones（至少几百，单细胞数数据很合适）。which depending on the degree of clonal expansion may require a substantially larger number of individual cells.

We are optimistic that new analytical approaches combined with novel high-throughput single-cell experiments will continue to illuminate new aspects of adaptive immunology in the coming years.当然，方法很新，就意味着需要很多的补充。

10X单细胞（10X空间转录组）TCR转录组联合数据分析之TCRdist3(6)neighbor graph analysis（CoNGA）

方法特别重要，我们这一次分享文献，以后分享示例代码

文献的部分我摘录重点给大家

Abstract

2、这些丰富的高维数据集有可能揭示 TCR 序列和 T 细胞表型之间的新关系，这些关系超越了克隆相关细胞共享特征的识别。当然，我们在识别TCR和转录组关系的同时也是非常困难的，做过相应分析的同学应该都知道。

Introduction

1、Previous work pairing gene expression and TCR sequence has largely focused on the TCR sequence as a unique 'barcode' by which to identify clonally related cells.（对肿瘤疾病方面的研究当然非常重要）。

2、From these works we see that T cell clones derived from a common clonal ancestor tend to display a similar transcriptional profile.（这一点我也是第一次见到这样的说法，不过，应该是对的）。

3、然而,TCR 序列相似性与细胞表型之间的关系尚未使用现有的大型单细胞数据集进行系统探索。(单独分析的现在居多)。

关键点6、it is clear that T cells targeting the same pathogen-derived epitope utilize T cell receptors that share consistent, definable amino acid motifs.（这个毋庸置疑，不然怎么起作用）。

8、已经描述了许多不同的 T 细胞亚群的subsets，它们具有适合其富集的标记，但是，很可能其他由 TCR 和 GEX 连接的亚群仍未被发现。

10、CoNGA，通过分析定义在一组 T 细胞克隆型上的相似性图来识别 GEX 谱和 TCR 序列特征之间的相关性，并将其应用于公开可用的 T 细胞数据集的集合，以无偏见地搜索通过协变关联的 T 细胞群 他们的repertoire特点和 GEX 资料 。

Result

CoNGA algorithm(列举重点)

CoNGA was developed to identify correlations between gene expression profile and TCR sequence in diverse T cell populations without prior knowledge of the precise nature of these correlations（基因表达信息和TCR信息的的关联分析）。

envisioned two broad categories of correlation:

2、CoNGA correlation was developed to detect the first category of correlation, using the mathematical concept of graph neighborhoods to formalize our intuitive notion of global similarity。

3、在没有相关特征的先验知识的情况下，从头发现基于特征的相关性更具挑战性，因为它需要枚举和测试所有可能的特征对。 （当然，也最可靠）。

4、CoNGA analysis represents a compromise approach(折中方法) in which we assume that, at least on one side of the correlation, some degree of global similarity is present。

（定义）CoNGA similarity graphs are defined at the level of clonotypes rather than individual cells.T cells of the same clonotype, which by definition have the same TCR sequence, tend to have similar GEX profiles。

Thus, similarity graphs based on gene expression drawn at the level of individual cells will contain many edges connecting cells within the same clonal family.（相互证明的关系）。

10、In graph-vs-graph correlation analysis（图片的内容需要好好分析）,CoNGA identifies statistically significant overlap between the GEX similarity graph and the TCR similarity graph.

13、model（塑造） the concept of a clonotype's neighbors in GEX or TCR space using the mathematical concept of a , defined as all the vertices directly connected to one central vertex

15、In graph-vs-feature correlation analysis,

CoNGA maps numerical features derived from one property (gene expression or TCR sequence) onto the similarity graph defined by the other property and looks for neighborhoods in the graph with unexpectedly high or low feature distributions(当然，展示结果也很好)。

接下来具体情况进行分析（讲道理，理解起来有点难啊）

CoNGA graph-vs-graph analysis identifies correlation between gene expression and TCR sequence（首先是公共数据，成对的T细胞转录组和TCR）。

上图illustrates the CoNGA graph-vs-graph analysis workflow for two datasets of human peripheral blood T cells, one a mix of CD4+ and CD8+ cells (vdj_v1_hs_pbmc, Fig. 2a-c) and one containing flow-sorted CD8+ T cells (10x_200k_donor2a, Fig. 2d-f）。看看详细分析步骤。

第一步：the UMAP algorithm is applied to the gene expression and TCRdist （TCR序列相似度构成的矩阵）matrices of each dataset to generate two dimensional projections of the GEX and TCR landscapes。（相当于降维）。

第三步：the GEX and TCR cluster assignments of CoNGA hits with scores below a threshold (here 1.0) are shown in the 2D projections using bicolored disks whose left (right) half corresponds to the GEX (TCR) cluster assignment。（分布CoNFA分数）。

These plots reveal that both datasets contain a substantial number of clonotypes with significant CoNGA scores, and that these CoNGA hits are located in specific regions of the GEX and TCR landscapes.（跟之前的预期一致）。

案例1 CoNGA defines a HOBIT+/HELIOS+ T cell population shared

As a percentage of total PBMC CD8 T cells, the KLRC2+ KIR2D+/- subset is in the range of 0.2-10.1% while KLRC2- KIR2D+ cells ranged between 0.3-7.6%

However, ZNF683 appeared to be enriched only within the KLRC2+ KIR2D+/- subset, supporting their identity as the putative HOBIT+ population and further suggesting KLRC2- KIR2D+ T cells are in fact a separate, distinct subset。

综上所述，这些数据证实了外周血中表达 ZNF683 的 CD8+ CD45RA+ CD45ROdim/- CCR7-KLRC2+ KIR2D+/- T 细胞的存在与 HOBIT+ 群体一致，并且该亚组虽然因个体而异，但占 CD8 的很大一部分 T 细胞（在某些个体中高达 10%）。（发现了新的亚群）。

案例2 CoNGA identifies GEX/TCR correlation in thymic T cells（对胸腺T细胞的分析，数据包含了不同时间段的人群，totaling over 9400 clonotypes with paired alpha and beta TCR sequences.）。

CoNGA identified a large number of significant hits in this rich and complex dataset, primarily within the DP (double-positive), CD8 single positive (SP), CD4 SP, Treg, and CD8αα+ thymic populations

接下来第二部分，CoNGA analysis confirms sharing of the HOBIT+/HELIOS+ T cell subset across donors(需要转换思路了)。

案例1

2、CoNGA graph-vs-feature analysis was developed as a complementary graph based approach that could detect GEX/TCR correlations that are not characterized by global similarity of both properties（这个才是重点）。

4、As GEX features, we consider the expression levels of individual genes, and for TCR sequence features, we use a set of CDR3 amino acid property values as well as a handful of additional, sequence-based scores（像这样说点大白话不行么 😄）。

案例2，Graph-vs-feature analysis reveals differential gene expression

（转换策略）We applied graph-vs-feature analysis in the reverse direction to identify genes that are differentially expressed in specific TCR graph neighborhoods.（the top significant gene for each cluster pair and a maximum of 10 genes per dataset ）。

下图 illustrates four graph-vs-feature correlations, showing visually how specific TCR-based and GEX-based features correlate across the 2D clonotype landscapes.

这些发现表明 TCR 基因座边缘的 TCR 基因的使用与该基因座侧翼非 TCR 基因的表达之间存在相互作用（最为重要的一点）。

案例3，Neighbor-graph analysis of TCR:pMHC binding highlights GEX

对于每个 pMHC，我们查看在该 pMHC 阳性的克隆型集合中是否存在比我们偶然预期更多的 GEX（或 TCR）similarity edges，并通过计算fold-enrichment以及 近似 P 值

最后，作者说了一些该软件的缺点。

1、a consequence of operating at the level of clonotypes rather than individual cells is that we miss out on variation within the cells of expanded clones.

3、results of applying CoNGA will depend critically on the distance measures

4、Another limitation is that, in our experience, successful application of

We are optimistic that new analytical approaches combined with novel high-throughput single-cell experiments will continue to illuminate new aspects of adaptive immunology in the coming years.当然，方法很新，就意味着需要很多的补充。

这个专题太多了，生活很好，有你更好，下一篇我们介绍这个软件的两种分析方法的原理。

相关阅读更多精彩内容

友情链接更多精彩内容

10、CoNGA，通过分析定义在一组 T 细胞克隆型上的相似性图来识别 GEX 谱和 TCR 序列特征之间的相关性，并将其应用于公开可用的 T 细胞数据集的集合，以无偏见地搜索通过协变关联的 T 细胞群他们的repertoire特点和 GEX 资料。

2、CoNGA $graph-vs-graph$ correlation was developed to detect the first category of correlation, using the mathematical concept of graph neighborhoods to formalize our intuitive notion of global similarity。

3、在没有相关特征的先验知识的情况下，从头发现基于特征的相关性更具挑战性，因为它需要枚举和测试所有可能的特征对。（当然，也最可靠）。

4、CoNGA $graph-vs-feature$ analysis represents a compromise approach(折中方法) in which we assume that, at least on one side of the correlation, some degree of global similarity is present。

13、model（塑造） the concept of a clonotype's neighbors in GEX or TCR space using the mathematical concept of a $graph$ $neighborhood$ , defined as all the vertices directly connected to one central vertex

接下来第二部分，CoNGA $graph-vs-feature$ analysis confirms sharing of the HOBIT+/HELIOS+ T cell subset across donors(需要转换思路了)。

对于每个 pMHC，我们查看在该 pMHC 阳性的克隆型集合中是否存在比我们偶然预期更多的 GEX（或 TCR）similarity edges，并通过计算fold-enrichment以及近似 P 值