10X单细胞(10X空间转录组)轨迹分析(拟时分析)VECTOR之文献分享

hello,昨天我们分享了VECTOR的示例代码,文章在10X单细胞(10X空间转录组)轨迹分析(拟时分析)之VECTOR,2020年8月发表于Cell Reports,对于其原理,我们还是需要认真总结一下的,这篇短文就让我们来分享一下这篇文献,把握重点,看看这个软件的特点及运用情况,对软件的把握做到心中有数。

SUMMARY

A key step in trajectory inference is the determination of starting cells(这个大家应该深有体会,所以做个性化分析之前都是需要细胞定义的), which is typically done by using manually selected marker genes(目前大多数细胞定义的方法还是依赖于人工选择marker,相似性映射的方法目前问题太多). In this study, we find that the quantile polarization(分位数极化 ???) of a cell’s principal-component values is strongly associated with their respective states in development hierarchy(主成分的value与细胞发育状态相关), and therefore provides an unsupervised solution for determining the starting cells(这个地方需要深入研究一下). Based on this finding, we developed a tool named VECTOR that infers vectors of developmental directions for cells in UniformManifold Approximation and Projection (UMAP). In seven datasets of different developmental scenarios, VECTOR correctly identifies the starting cells and successfully infers the vectors of developmental directions. VECTOR is freely available for academic use at https://github.com/jumphone/Vector.(运用示例很好,每篇文章都是这么说的)。

INTRODUCTION

这个地方我们提炼一下

TI方法的算法(monocle,PAGA,slingshot等,这几个软件大家都应该很熟悉)设计有两个共同的组成部分

  • the use of dimensional reduction, clustering, or graph-building techniques to convert scRNA-seq data into a simplified representation of trajectory, and the ordering of cells along the trajectory.(降维聚类,很常规
  • there may be many alternative trajectories to choose from, most TI methods require the use of prior information, such as a set of known marker genes, to determine the starting cells (SCs) of the correct trajectory.(说白了,需要做细胞定义来决定发育的起点,不做细胞定义的轨迹分析都是耍流氓
    marker的人为主观选择确实存在很大的误差,Recently, a new study found that RNA velocity(RNA Velocyto确实这个方面做的不错,人为干预减少),the time derivative of gene expression states, could be estimated by modeling the relationship between unspliced and spliced mRNAs, making it possible to deduce the future transcriptional states of cells and consequently the developmental trajectories without the need of prior information for determining SCs(依据可变剪切来推断发育轨迹,这个方法高分文献经常用到),在没有使用任何先验信息的情况下,使用RNA速度鉴定了神经c谱系细胞的新型发育模型,证明了其在发育谱系分析中的有用性。

看一下RNA velocyto的缺点

  • reanalyze raw sequencing data to determine intron reads for quantifying unspliced mRNAs, which is time-consuming and sometimes may not be possible because of the limitation of the sequencing platforms.(这也不算什么缺点)。

现在做单细胞分析确实PCA分析是必需的,Cells at different developmental states have been shown to
have distinct patterns of PC values.However, the patterns of a cell’s PC values have not yet been fully explored in the current TI methods.(这个地方作者持保留意见),In this study, we observed that the averaged polarization of a cell’s PC values across a large number of PC subspaces is strongly correlated with their developmental states, with SCs having the most polarized PC values.(这个地方需要注意一下,不知道大家注意过没有,初始细胞的PC值很特别么??待会看看看方法),We thus provided an unsupervised solution for determining the SCs based on the averaged polarization of a cell’s PC values.(依据PC值来确定发育起点,这个方法不能说是无监督,必须半监督),当然,作者的示例当然很不错,我们自己用需要点注意了。

Result

第一步是拿定义好的两个单细胞数据集验证软件的可靠性

我们做PCA分析的时候,一般选择前十几个PCA做下游的分析,Seurat本身会计算50个PCA,作者这个地方采用的却是150个PCA,这个地方依据是什么,需要在方法中看看了。

图片.png

在数据集分析中发现,For both oligodendrocyte and enterocyte lineages, we found that cells at earlier developmental stages tend to have more extreme PC values(更极端的PCA值)(either very small or very large—i.e., highly polarized(极化原来是这个意思,服了)),while those at later developmental stages tend to have more intermediate PC values(这个规律还真没注意过,需要拿自己的数据来尝试一下了)。such patterns were more obvious if we inspected the density of the PC value quantiles at all 150 PC subspaces for cells at different developmental stages。(看图规律倒是很明显
图片.png

To quantify the polarization of the PC value quantiles, we next defines a quantile polarization (QP) score that averages the polarization of the PC value quantile of a given cell across all 150 PC subspacesQP的定义,这个方式讲道理, 我还是第一次见),然后QP的值很发育层级相关性很高,with cells at the earliest developmental stages having the greatest QP scores。
图片.png

We further experimented with using a different number of PCs, and found that such correlations were robust if the number of PCs used could explain ~20%–80% of the total variance。

UMAP直接推断轨迹发生,这个在monocle3软件中有运用

In essence, VECTOR treats a twodimensional UMAP representation of cells as an image and splits it into a number of pixels. After removing those pixels that do not include any cells, VECTOR focuses on the largest connected pixel (LCP) network in UMAP to infer developmental directions.看来这个软件这是在UMAP图上进行轨迹的推断)。By averaging the QP scores of cells inside each pixel, VECTOR identifies the high-scoring pixels that have the greatest QP scores (top 10% by default).(PCA的极化值推断发育起点的细胞),作者也提到了这个方法可能会存在假阳性,Here, VECTOR considers not only QP scores but also the connectivity of cells in UMAP; from the high-scoring pixels, it selects the largest connected high scoring pixels as the starting point of development. (联合UMAP的分析结果进行综合分析,得到发育起点的细胞),Those isolated high-scoring pixels that are likely false positives are then filtered out.(这个地方其实有bug)。For each pixel in the LCP network, VECTOR computes a pseudotime score defined as
its network distance to the starting point of development(大部分软件都是这么计算的)。Finally, for a given target pixel VECTOR computes a vector (with arrow and length) by taking into consideration the information of all pixels in the LCP network, including the direction of the unit vector pointing from a selected pixel to the target pixel, the relative pseudotime score between the target pixel and the selected pixel, and the closeness of the selected pixel to the target pixel in the LCP network, and so on.(分析结果得到类似RNA Velocyto的图)。箭头的方向就是发育的方向,临近发育起点和发育中期,箭头较短,临近发育终点箭头较长

运用示例

刚才定义好的两个数据集表现很好,成功识别了发育起点和轨迹


图片.png

运用到其他示例数据,效果也不错


图片.png
Vector 和 RNA Velocyto的比较
图片.png

Vector效果更好,RNA Velocyto有截断,which may be caused by the lack of intron reads in these cells.当然,Velocyto也很难识别发育的起点。

接下来是运用到多发育分支的数据

图片.png

效果不错。当然,软件也提供了人工选择发育起点的功能。

Method

The workflow of VECTOR

Given a two-dimensional UMAP representation of cells, VECTOR treats it as an image, and then splitting it into a number of pixels. We provide a parameter called ‘‘N’’ for defining the number of pixels in UMAP.

图片.png

不仅仅有数据处理,还有图片处理的相关信息

大家不妨试一试吧

生活很好,有你更好

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 215,133评论 6 497
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,682评论 3 390
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 160,784评论 0 350
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,508评论 1 288
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,603评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,607评论 1 293
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,604评论 3 415
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,359评论 0 270
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,805评论 1 307
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,121评论 2 330
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,280评论 1 344
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,959评论 5 339
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,588评论 3 322
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,206评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,442评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,193评论 2 367
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,144评论 2 352

推荐阅读更多精彩内容