简介:
- https://www.nature.com/articles/s41592-020-0748-5
- https://www.biorxiv.org/content/10.1101/519660v1.full.pdf
简单说,就是如果觉得目前的聚类软件的分类效果不太好,可以用这个软件用可视化进化分支的形式将细胞分群
详细教程
- 生信技能树的:https://cloud.tencent.com/developer/article/1605955
- Githubu原文:https://github.com/GregorySchwartz/too-many-cells
安装
由于不想安装那么多的依赖包,下面的操作全部基于Docker
docker pull gregoryschwartz/too-many-cells:0.2.2.0
启动docker容器
docker run -it --rm -v "/home/luohb:/share/nas1/Data/Users/luohb/Personalization/20191206/TooManyCells" gregoryschwartz/too-many-cells:0.2.2.0 -h
too-many-cells, Gregory W. Schwartz. Clusters and analyzes single cell data.
Usage: too-many-cells (make-tree | interactive | differential | diversity |
paths)
Available options:
-h,--help Show this help text
Available commands:
make-tree
interactive
differential
diversity
paths
输入文件构建
这里输入既可以是一个文件夹(里面放 10X cellranger 的 3 个文件),也可以是一个 csv 格式的普通表达矩阵
1. 矩阵:
PS:如果是一个count矩阵文件记得第一行的第一列是逗号,行名标签和列标签可以没有双引号
"","A22.D042044.3_9_M.1.1","C5.D042044.3_9_M.1.1","D10.D042044.3_9_M.1.1","E13.D042044.3_9_M.1.1","F19.D042044.3_9_M.1.1","H2.D042044.3_9_M.1.1","I9.D042044.3_9_M.1.1",...
"0610005C13Rik",0,0,0,0,0,0,0,...
"0610007C21Rik",0,112,185,54,0,96,42,...
"0610007L01Rik",0,0,0,0,0,153,170,...
"0610007N19Rik",0,0,0,0,0,0,0,...
"0610007P08Rik",0,0,0,0,0,19,0,...
"0610007P14Rik",0,58,0,0,255,60,0,...
"0610007P22Rik",0,0,0,0,0,65,0,...
"0610008F07Rik",0,0,0,0,0,0,0,...
"0610009B14Rik",0,0,0,0,0,0,0,...
...
2. 标签文件
item,label
AAACCTGCAGTAACGG-1,Marrow
AAACGGGAGACCGGAT-1,Marrow
AAACGGGAGCGCTCCA-1,Marrow
AAACGGGAGGACGAAA-1,Marrow
AAACGGGAGGTACTCT-1,Marrow
...
这里的标签文件,可以是细胞的样本来源信息,或者认为分群的标签,只作为最后上色的结果,不影响最后进化树的分支结构
运行
docker run -it --rm -v /share/nas1/Data/Users/luohb/Personalization/TooManyCells/test:/test \
gregoryschwartz/too-many-cells:0.2.2.0 make-tree \
--matrix-path /test/count.csv \
--labels-file /test/OrigIdent.labels.csv \
--draw-collection "PieRing" \
--output /test/LabelsBySamples > log
结果类似这样
“修剪”分支
默认参数下的分支太细了,可以通过两种方式来调整:
- 直接设置
--min-size
:规定最小分支细胞数。使用参数将叶子的最小大小设置为100个细胞 - 设置
--smart-cutoff
,通过 n*中位数绝对偏差(MAD) ,改变树上叶子的数量。
可以结合--min-size
,--max-proportion
,--min-distance
,或--min-distance-search
一起用
另外,我们不需要重新计算整个树!我们可以使用参数--prior
来提供以前的结果(我们也可以用--prior
删除--matrix-path
来加快处理速度,不过可能会失去某些功能特性)
docker run -it --rm -v /share/nas1/Data/Users/luohb/Personalization/TooManyCells/test:/test \
gregoryschwartz/too-many-cells:0.2.2.0 make-tree \
--prior /test/LabelsBySamples --labels-file /test/OrigIdent.labels.csv \
--smart-cutoff 1 --min-size 1 \
--draw-collection "PieChart" #末端改成饼图 \
--output /test/pruned_LabelsBySamples > log1_2
最后结果类似
提取子集
cp log3_2 clusters_pruned.csv
vi clusters_pruned.csv
# vim中
%s/^M$//g
各个节点的结果在Docker中会显示有些问题,需要手动修改成以下形式
$ head clusters_pruned.csv
cell,cluster,path
AAACGGGAGGTGTTAA.1,9,9/8/7/6/5/4/3/2/1/0
AACACGTTCGGCGGTT.1,9,9/8/7/6/5/4/3/2/1/0
AACCGCGGTATATGAG.1,9,9/8/7/6/5/4/3/2/1/0
ACACCCTTCTGGTTCC.1,9,9/8/7/6/5/4/3/2/1/0
ACCTTTAAGGTGTTAA.1,9,9/8/7/6/5/4/3/2/1/0
ACGAGGACACGTTGGC.1,9,9/8/7/6/5/4/3/2/1/0
AGGGAGTCAGGCTCAC.1,9,9/8/7/6/5/4/3/2/1/0
AGGGATGAGCGATAGC.1,9,9/8/7/6/5/4/3/2/1/0
AGTGGGAAGATGTAAC.1,9,9/8/7/6/5/4/3/2/1/0
标注上节点信息
docker run -it --rm -v /share/nas1/Data/Users/luohb/Personalization/TooManyCells/test:/test \
gregoryschwartz/too-many-cells:0.2.2.0 make-tree
--prior /test/LabelsBySplitGroup \
--labels-file /test/SplitGroup.labels.csv --smart-cutoff 1 --min-size 1 --draw-collection "PieChart" \
--draw-node-number #加上节点信息\
--output /test/number_pruned_LabelsBySplitGroup > log7
然后可以根据节点对应的barcode去提取细胞子集
基因表达情况
docker run -it --rm -v /share/nas1/Data/Users/luohb/Personalization/TooManyCells/test:/test \
gregoryschwartz/too-many-cells:0.2.2.0 make-tree \
--prior /test/LabelsBySplitGroup \
--matrix-path /test/count.csv \
--labels-file /test/SplitGroup.labels.csv \
--smart-cutoff 1 \
--min-size 1 \
--draw-leaf "DrawItem (DrawThresholdContinuous [(\"gene1\", 0), (\"gene2\", 0)])" \
--draw-colors "[\"#e41a1c\", \"#377eb8\", \"#4daf4a\", \"#eaeaea\"]"\
--draw-scale-saturation 10 \
--output /test/out_gene_expression \
> clusters_pruned_gene_expression.csv
结果类似
差异基因分析
根据提供的标签进行差异基因分析
两个节点之间的差异分析
$docker run -it --rm -v /share/nas1/Data/Users/luohb/TooManyCells/test:/test \
gregoryschwartz/too-many-cells:0.2.2.0 differential \
--prior /test/LabelsBySplitGroup \
--matrix-path /test/count.csv \
--labels-file /test/SplitGroup.labels.csv \
-n "([70, 3, 105, 166], [45])" \
> clusters_pruned_gene_expression.csv
对所有节点进行查找Marker基因
$cat run12.sh
$docker run -it --rm -v /share/nas1/Data/Users/luohb/TooManyCells/test:/test \
gregoryschwartz/too-many-cells:0.2.2.0 differential \
--prior /test/LabelsBySplitGroup \
--matrix-path /test/count.csv \
-n "([], [])" \
--normalization "UQNorm" \
+RTS -N26
--plot-output /test/plot.pdf
-t 5 #限定节点层级
$sh run12.sh >FindAllMarker.txt