关于RNA velocity
RNA速率,即基因表达状态的时间导数,在常见的单细胞RNA测序方案中,可以通过新生(未剪接)和成熟(剪接)mRNA的相对丰度来估计基因剪接和降解的速率,曲美未剪接和间接的mRNA,来直接估计细胞基因表达的动态分化。刚转录出来的mRNA包含外显子和内含子,经过splicing切除内含子后,得到用于编码蛋白的spliced mRNA。spliced mRNA丰度由未成熟的mRNA的splicing速度和降解速率共同决定。通过计算未剪接转录本和剪接转录本之间的比率来推断细胞命运的状态(过渡与稳定)和方向性(轨迹)
我们可以通过RNA速率来确定在给定的目标细胞群中是否诱导或抑制了目标基因。此外,我们可以通过伪时间轨迹推断这些信息来预测细胞命运决定。
假设ABC三种细胞RNA剪接率由低到高,那么推测分化方向为A-B-C
下载velocyto
Velocyto consists of two main components:
A command line interface (CLI), that is used to run the pipeline that generates spliced/unspliced expression matrices.
A library including functions to estimate RNA velocity from the above mentioned data matrices.
注意!bam文件应命名为
possorted_genome_bam.bam
合并loom文件的方法
整合loom
velocyto run10x -m /home/user/myh/ref_data/mm10_allTracks.gtf /home/user/myh/raw_data/LXY/clean_data/non_mCherry/non_mcherry_loom/non_mCherry /home/user/myh/ref_data/refdata-gex-mm10-2020-A/genes/genes.gtf
在samplefile下生成了velocyto文件夹,里面有loom文件
They do not support writing and reading concurrently. They also do not support journalling, so if something happens during a write, the entire file can be lost. Therefore, do not use loom files as your primary data storage. They are for working with data, not keeping it safe.
-------------以上是之前尝试的步骤,下面整理一下分析RNA velocity的整体流程-------------
- clean data FQ文件质控:Fastqc & multiQC
- 整理FQ文件名称:对于同一样本拆分出的不同FQ文件,可以通过持“XXX_S1_L001_R1_001”前面的名称相同,修改L001 L002等,让cellranger识别它们为同一样本的FQ,免除后续bam或loom文件的合并
- 利用velocyto将bam文件转成loom文件
- 将原始的矩阵文件和对应的loom文件利用scVelo合并
adata2_to_merge = sc.read_10x_mtx('/home/user/mCherry/mCherry_cellranger_count/outs/filtered_feature_bc_matrix', # the directory with the `.mtx` file
var_names='gene_symbols', # use gene symbols for the variable names (variables-axis index)
cache=True)
ldata2 = scv.read('/home/user/mCherry_cellranger_count/velocyto/mCherry_cellranger_count.loom', cache=True)
adata2 = scv.utils.merge(adata2_to_merge, ldata2)
- 用Scanpy完成后续预处理、合并、batch correction、聚类、注释等操作
- 将注释好的文件用scVelo进行后续RNA velocity的分析