本次学习资料来源,结合视频观看效果更佳,视频相关代码如下:
- 视频如下:https://www.youtube.com/watch?v=uvyG9yLuNSE
- 代码:https://github.com/mousepixels/sanbomics_scripts
- 主代码:https://github.com/mousepixels/sanbomics_scripts/blob/main/single_cell_analysis_complete_class.ipynb
文献中的Fig3如下:
image-20221004215250070.png
本次绘制图C:
Differential gene expression (log-normalized, scaled; see Methods) of AT1 and AT2 cells from COVID-19 and control lungs. Columns, single cells; rows, expression of top-regulated genes. Left bar, lineage markers for AT1 (purple) and AT2 (pink) cells. Colour-coded top lanes indicate expression strength of signatures (log-normalized; see Methods) and group assignment as indicated on the right. exp., expression
数据读取
import scanpy as sc
import scvi
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# 注意dir改成自己的路径
#dir = '/dir/data/GSE171524/'
adata = sc.read_h5ad(dir+'integrated_anno.h5ad')
adata
差异分析
首先提取注释到的AT1与AT2
subset = adata[adata.obs['cell type_sub'].isin(['AT1', 'AT2'])].copy()
subset
总共20741个细胞
image-20221004220933957.png
差异分析可使用SCVI or diffxpy来做
diffxpy方法差异分析
这里需要安装diffxpy这个包,还比较麻烦。需要安装在scanpy的conda环境中,安装如下:
# 先安装依赖 batchglm 需要安装以前的版本:https://pypi.org/project/batchglm/#history 找到0.7.4版本,下载到本地
tar -zxvf batchglm-0.7.4.tar.gz
cd batchglm-0.7.4
pip install -e .
# 再安装 diffxpy
# 下载包到本地:https://github.com/theislab/diffxpy
unzip diffxpy-master.zip
cd diffxpy-master
pip install -e .
这样就安装成功了:
# two options: SCVI or diffxpy
import diffxpy.api as de
subset.X = subset.X.toarray()
len(subset.var)
#20858
# 地表达过滤
sc.pp.filter_genes(subset, min_cells=100)
len(subset.var)
#13412
# 修改一下细胞注释的列名
subset.obs = subset.obs.rename(columns = {'cell type_sub':'cell_type_sub'})
subset.obs
差异分析:
#if want to test between covid/non covid
# res = de.test.wald(data=subset,
# formula_loc= '~ 1 + condition',
# factor_loc_totest='condition'
# )
res = de.test.wald(data=subset,
formula_loc= '~ 1 + cell_type_sub',
factor_loc_totest='cell_type_sub'
)
dedf = res.summary().sort_values('log2fc', ascending = False).reset_index(drop = True)
dedf
结果如下:
image-20221005001446692.png
查看谁是实验组,谁是对照组,并调整为 AT1 vs AT2
subset.obs.cell_type_sub.unique()
# ['AT2', 'AT1']
# Categories (2, object): ['AT1', 'AT2']
most_up = dedf.iloc[0].gene
i = np.where(subset.var_names == most_up)[0][0]
a = subset[subset.obs.cell_type_sub == 'AT1'].X[:, i]
b = subset[subset.obs.cell_type_sub == 'AT2'].X[:, i]
print(f"{most_up} expression:")
print(f"AT1: {a.mean()}")
print(f"AT2: {b.mean()}")
# THBS2 expression:
# AT1: 0.0
# AT2: 0.04352555051445961
dedf['log2fc'] = dedf['log2fc']*-1
dedf = dedf.sort_values('log2fc', ascending = False).reset_index(drop = True)
dedf
结果图:
image-20221005001900728.png
挑选显著差异结果:
dedf = dedf[(dedf.qval < 0.05) & (abs(dedf.log2fc) > .5)]
dedf
结果图如下,有4836个显著差异表达基因:
image-20221005002146043.png
卡表达值:
dedf = dedf[dedf['mean'] > 0.15]
dedf
还有1095个基因:
image-20221005002309756.png
选择top50个基因绘制热图:
#top 25 and bottom 25 from sorted df
genes_to_show = dedf[-25:].gene.tolist() + dedf[:25].gene.tolist()
sc.pl.heatmap(subset, genes_to_show, groupby='cell_type_sub', swap_axes=True)
plt.savefig(dir+"06-intergration_heatmap.png")
结果图:
image-20221005002522720.png
scvi方法差异分析
# 加载上一次保存的模型
model = scvi.model.SCVI.load(dir+ 'model.model', adata)
model
# 差异分析
scvi_de = model.differential_expression(
idx1 = [adata.obs['cell type_sub'] == 'AT1'],
idx2 = [adata.obs['cell type_sub'] == 'AT2']
)
scvi_de
# 显著性结果筛选
scvi_de = scvi_de[(scvi_de['is_de_fdr_0.05']) & (abs(scvi_de.lfc_mean) > .5)]
scvi_de = scvi_de.sort_values('lfc_mean')
scvi_de
# 表达筛选
scvi_de = scvi_de[(scvi_de.raw_normalized_mean1 > .5) | (scvi_de.raw_normalized_mean2 > .5)]
scvi_de
最终筛选到952个基因:
image-20221005003115485.png
#top 25 and bottom 25 from sorted df
genes_to_show = scvi_de[-25:].index.tolist() + scvi_de[:25].index.tolist()
sc.pl.heatmap(subset, genes_to_show, groupby='cell_type_sub', swap_axes=True, layer = 'scvi_normalized',log = True)
plt.savefig(dir+"06-intergration_heatmap_scvi.png")
结果图如下:
image-20221005003207962.png
结果保存
dedf.to_csv(dir+'dedf.csv')
scvi_de.to_csv(dir+'scvi_de.csv')
下次见~