10X单细胞轨迹分析(拟时分析)之cytotrace

hello,大家好,这次我们来分享一下做轨迹分析的软件----CytoTRACE,文章在Single-cell transcriptional diversity is a hallmark of developmental potential,2020年1月表达于science,相当牛了,跟URD有一拼。当然,关于轨迹分析的方法之前分享过很多了,比如单细胞数据拟时分析之VIA(我的优势你们比不了),10X单细胞轨迹分析之回顾,拟时分析软件Palantir,以及空间转录组轨迹分析的方法10X空间转录组的轨迹分析,今天我们来看看这个软件有什么不同。

Abstract

Single-cell RNA sequencing (scRNA-seq) is a powerful approach for reconstructing cellular differentiation trajectories. However, inferring both the state and direction of differentiation is challenging(这个显而易见). Here, we demonstrate a simple, yet robust, determinant(决定条件) of developmental potential—the number of expressed genes per cell(基因表达的数量)—and leverage this measure of transcriptional diversity to develop a computational framework(依据基因表达的数量进行发育轨迹的推断??牛啊) (CytoTRACE) for predicting differentiation states from scRNA-seq data. When applied to diverse tissue types and organisms, CytoTRACE outperformed previous methods and nearly 19,000 annotated gene sets for resolving 52 experimentally determined developmental trajectories(背景倒是很丰厚). Additionally, it facilitated the identification of quiescent stem cells and revealed genes that contribute to breast tumorigenesis. This study thus establishes a key RNA-based feature of developmental potential and a platform for delineation of cellular hierarchies.(看来这个方法有很多值得一看的地方)。

introduction

Inmulticellular organisms, tissues are hierarchically organized into distinct cell types and cellular stateswith intrinsic differences in function and developmental potential。当然,目前已经有了很多新的方法,但是 Though powerful, these technologies cannot be applied to human tissues in vivo and generally require prior knowledge of cell type–specific genetic markers(做轨迹分析必须先进行细胞定义,否则都是耍流氓)。These limitations have made it difficult to study the developmental organization of primary human tissues under physiological and pathological conditions。(不知道大家拟时分析的时候,研究的有多深)。
Single-cell RNA-sequencing (scRNA-seq) has emerged as a promising approach to study cellular differentiation trajectories at high resolution in primary tissue specimens(单细胞确实是一个划时代的技术),目前大多数轨迹分析的软件需要
(1)a priori knowledge of the starting point (and thus, direction) of the inferred biological process(先验知识,不进行细胞定义直接做轨迹分析就是耍流氓)。
(2)the presence of intermediate cell states to reconstruct the trajectory(含有细胞分化的中间态,理论上是这样)。
These requirements can be challenging to satisfy in certain contexts, such as human cancer development研究肿瘤样本单细胞数据的童鞋是不是深有体会?)。
目前的方法还有一个缺点
with existing in silico approaches, it is difficult to distinguish quiescent(静止的) (noncycling) adult stem cells that have long-term regenerative potential frommore specialized cells(这种情况其实在我们研究单细胞数据的情况下非常少见),而且gene expression–basedmodels utility across diverse developmental systems and single-cell sequencing technologies is still unclear.
Here,we systematically evaluated RNA-based features, including nearly 19,000 annotated gene sets, to identify factors that accurately predict cellular differentiation status independently of tissue type, species, and platform.(开始夸自己的软件了),我们来看一下这个软件的理论和运用吧

Result1 RNA-based correlates of single-cell differentiation states(最关键的地方

Our initial goal was to identify robust, RNAbased determinants of developmental potential potential without the need for a priori knowledge of developmental direction or intermediate cell states marking cell fate transitions.(没有先验知识的前提下识别发育的方向和细胞的转变),Using scRNA-seq data, we evaluated ~19,000 potential correlates of cell potency, including all available gene sets in the Molecular Signatures Database。896 gene sets covering transcription factor binding sites from ENCODE (17) and ChEA (18), an mRNA expression–derived stemness index (mRNAsi) (15), and three computational techniques that infer stemness as a measure of transcriptional entropy这个地方了解一下就可以了),We also explored the utility of “gene counts,” or the number of detectably expressed genes
per cell. Although anecdotally observed to correlate with differentiation status in a limited number of settings(这也是文章的重点,基因数量和发育的关系),the reliability of this association and whether it reflects a general property of cellular ontogeny are unknown.
To assess these RNA-based features, we compiled a training cohort consisting of nine gold standard scRNA-seq datasets with experimentally confirmed differentiation trajectories.These datasets were selected to prioritize commonly used benchmarking datasets from earlier studies and to ensure a broad sampling of developmental states from the mammalian zygote to terminally differentiated cells这才是真正的发育轨迹)。Overall, the training cohort encompassed 3174 single cells spanning 49 phenotypes, six biological systems, and three scRNA-seq platforms(种类很齐全)。To evaluate performance, we used Spearman correlation to compare each RNA-based feature, averaged by phenotype, against known differentiation states。We then averaged the results across the nine training datasets to yield a final score and rank for every feature(相关性检验)。
This systematic screen revealedmany known and unexpected correlates of differentiation status

图片.png

However, one feature in particular showed notable performance: the number of detectably expressed genes per cell (gene counts)(基因数量的特征非常明显)。这个地方给的理论在于干细胞,多能干细胞表达的基因数会比较多,而成熟的细胞类型表达的基因数量就会相对少Pluripotency genes对这一类基因感兴趣的同学可以查一下), by contrast, showed an arc-like pattern early in human embryogenesis that was characterized by progressively increasing expression until the emergence of embryonic stem cells, followed by decreasing expression(这个发现倒是很有意思)。
图片.png

这个地方,总结一下,分化能力强的细胞基因表达数相对很多,而多能性基因却呈现弧形的走向
These findings suggested that gene counts might extend beyond isolated experimental systems to recapitulate the full spectrum of developmental potential.接下来用小鼠的数据进行了验证
图片.png

和之前的结果一致,相关性非常高,其他物种也检验到了相同的结果
图片.png
,suggesting that it is a general feature of cellular ontogeny.

接下来是对染色体可及性和发育关系的研究
tested whether single-cell gene counts are ultimately a surrogate for global chromatin accessibility, which has been shown to decrease with differentiation in certain contexts,genome-wide chromatin accessibility was observed to progressively decrease with differentiation of hESCs into paraxial mesoderm and lateral mesoderm lineages这个结果都能猜到

图片.png

We observed strong concordance between thenumber of accessible peaks and the mean number of detectably expressed genes per phenotype
图片.png

看来这部分结果具有共性

Result2 Development of CytoTRACE

The number of expressed genes per cell generally showed consistent performance with respect to key technical parameters and was generally correlated with mRNA content(这个自然),However, in some datasets, such as that for in vitro differentiation of hESCs into the gastrulation layers, the number of expressed genes per cell exhibited considerable intraphenotypic variation(表型的部分其实单细胞用到的相对还少一点,但是ATAC的内容也相当重要

图片.png

看来轨迹分析与基因表达的数量关联性还是很强
we reasoned that genes whose expression patterns correlate with gene counts might better capture differentiation states. Indeed, by simply averaging the expression levels of genes that were most highly correlated with gene counts in each dataset(这个已经无数次被验证了)。the resulting dataset-specific
gene counts signature (GCS) became the topperforming measure in the screen, outranking every predefined gene set and computational tool that we assessed
图片.png

GCS, like gene counts, is inherently insensitive to dropout events, is agnostic to prior knowledge of developmentally regulated genes,(也就是说对技术缺陷和先验知识以来程度较小),and is not solely attributable to multilineage priming or a known molecular signature。

Result3 Performance evaluation across tissues, species, and platforms(多种来源的数据,这部分我们简单看一下

When assessed at the single-cell level, CytoTRACE outperformed all evaluated RNAbased features in the validation cohort,


图片.png

achieving a substantial gain in performance over the second-highest-ranking approach


图片.png

Similar improvements were observed acrossmany complex systems, including bone marrow differentiation


图片.png

In addition, CytoTRACE results were positively correlated with the direction of differentiation in 88% of datasets(已知发育轨迹的数据来验证软件的准确性,当然都不错)。
Moreover, no significant biases in performance were observed in relation to tissue type, species, the number of cells analyzed, time series experiments versus snapshots of developmental states, or
plate-based versus droplet-based technologies(bias很小,这个不错)。
接下来还和RNA velocyto的结果进行比较,当然,cytoTrace的结果相当不错

图片.png

作者推断cytoTRACE更准确的原因是This was likely due to the short mRNA half-lives and developmental time scales assumed for the RNA velocity model。
后面还有对多样本批次效应的验证,但是我们现在一般都会事先去除批次效应,然后再去做轨迹分析,方法之间还是要灵活运用

Result 4 Stem cell–related genes and hierarchies

图片.png

这个地方提到了关键的一点,CytoTRACE可以识别准确的起点,讲道理,真实的情况我是不信的,这部分结果简单了解一下就可以,真正做轨迹分析的时候一定要进行人为监督

Result5 Application to neoplastic disease

图片.png

还是要识别细胞类型,我真的不信这个软件能在纯数据的情况下,识别发育起点

接下来看看示例代码

Running CytoTRACE

Load CytoTRACE in R with library(CytoTRACE). The package contains the following contents:

Cytotrace(): function to run CytoTRACE on a custom scRNA-seq dataset
iCytoTRACE: function to run CytoTRACE across multiple, heterogeneous scRNA-seq batches/dataset
plotCytoTRACE: function to generate 2D visualizations of CytoTRACE, phenotypes, and gene expression
Two bone marrow differentiation scRNA-seq datasets (marrow_10x_expr and marrow_plate_expr) with corresponding phenotype labels (marrow_10x_pheno and marrow_plate_pheno)

Example I: Run CytoTRACE on a custom scRNA-seq dataset

Use the bone marrow 10x scRNA-seq dataset to run CytoTRACE

results <- CytoTRACE(marrow_10x_expr)

CytoTRACE will automatically run on fast-mode, a subsampling approach used to reduce runtime and memory usage, when the number of cells in the dataset exceeds 3,000. Users can additionally multi-thread using 'ncores' (default = 1) or indicate subsampling size using 'subsamplingsize' (default = 1,000 cells). Run the following dataset on fast mode using 8 cores and subsample size of 1,000.

results <- CytoTRACE(marrow_10x_expr, ncores = 8, subsamplesize = 1000)

The ouput is a list object containing numeric values for CytoTRACE (values ranging from 0 (more differentiated) to 1 (less differentiated)), ranked CytoTRACE, GCS, and gene counts, a numeric vector of the Pearson correlation between each gene and CytoTRACE, a numeric vector of the Pearson correlation between each gene and gene counts, the IDs of filtered cells, and a normalized gene expression table (see package documentation for more details).
Example II: Run iCytoTRACE on multiple scRNA-seq batches/datasets

Run iCytoTRACE on a list containing two bone marrow scRNA-seq datasets profiled on different platforms, 10x and Smart-seq2

datasets <- list(marrow_10x_expr, marrow_plate_expr)
results <- iCytoTRACE(datasets)

The ouput is a list object containing numeric values for the merged CytoTRACE (values ranging from 0 (more differentiated) to 1 (less differentiated)), ranked CytoTRACE, GCS, gene counts, the Scanorama-corrected gene expression matrix, the merged low dimensional embedding, and the IDs of filtered cells (see package documentation for more details).
Example III: Plot CytoTRACE and iCytoTRACE results
Visualizing CytoTRACE results

Generate 2D plots and tables to visualize CytoTRACE, known phenotypes, and gene expression. The current implementation uses t-SNE for dimensional reduction but users can also input their own embeddings. At minimum, the plotCytoTRACE function takes as input a list object generated by either the CytoTRACE or iCytoTRACE functions. Users can also optionally provide phenotype labels or gene names to generate additional plots. Boxplots of CytoTRACE by phenotype labels are automatically generated when phenotype labels are provided.

plotCytoTRACE(results, phenotype = marrow_10x_pheno, gene = "Kit")

The function saves two files to disk: -a pdf of 2D embedded plots colored by CytoTRACE, and, if provided, phenotype labels, and gene expression. -a tab-delimited text file containing a table of CytoTRACE values t-SNE embeddings, and, if provided, phenotype labels and gene expression values.
Visualizing genes associated with CytoTRACE

Generate a bar plot to visualize genes associated with CytoTRACE. At minimum, the plotCytoGenes function takes as input a list object generated by either the CytoTRACE or iCytoTRACE functions. Users can also indicate the number of genes and colors to display.

plotCytoGenes(results, numOfGenes = 10)

The function saves one file to disk:

a pdf of bar plots indicating the genes associated with least and most differentiated cells based on correlation with CytoTRACE.

参考网址在CytoTRCAE

代码相当简单,大家自己试一下吧,不过从结果看,人为监督必不可少

生活很好,有你更好

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
禁止转载,如需转载请通过简信或评论联系作者。
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 204,684评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 87,143评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,214评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,788评论 1 277
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,796评论 5 368
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,665评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,027评论 3 399
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,679评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 41,346评论 1 299
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,664评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,766评论 1 331
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,412评论 4 321
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,015评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,974评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,203评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,073评论 2 350
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,501评论 2 343

推荐阅读更多精彩内容