Python单细胞复现2022||06-功能富集分析及绘制Fig 3-d-e-g

本次学习资料来源,结合视频观看效果更佳,视频相关代码如下:

数据读取

import scanpy as sc
import scvi
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# 注意dir改成自己的路径
dir = '/dir/data/GSE171524/'
adata = sc.read_h5ad(dir+'integrated_anno.h5ad')
adata

# 添加一列细胞的样本来源,上次没保存这个
def map_condition(x):
    if 'cov' in x:
        return 'COVID19'
    else:
        return 'Control'

adata.obs['condition'] = adata.obs.Sample.map(map_condition)
adata.obs

# 提取注释到的AT1与AT2
subset = adata[adata.obs['cell type_sub'].isin(['AT1', 'AT2'])].copy()
subset

# 差异分析结果
dedf = pd.read_csv(dir + 'dedf.csv', index_col=0)
scvi_de = pd.read_csv(dir + 'scvi_de.csv', index_col=0)

富集分析

接下来对AT1 vs AT2的差异表达基因进行功能富集分析。

python中使用的是gseapy包进行功能富集分析,需要安装在scanpy的conda环境中,并且这个包需要在联网状态下使用:

# this method requires internet connection
# 安装:conda install -c bioconda gseapy -y
import gseapy as gp 

# 查看包中都有哪些功能数据库集合
gp.get_library_name()

enr = gp.enrichr(gene_list= dedf[dedf.log2fc > 0].gene.tolist(),
                 gene_sets=['KEGG_2021_Human','GO_Biological_Process_2021'],
                 organism='human', # don't forget to set organism to the one you desired!
                 outdir=None, # don't write to disk,
                 background = subset.var_names.tolist()
                )

enr.results

结果如下:

image-20221007012254424.png

比较:Fig 3-d-e

subset.obs = subset.obs.rename(columns = {'cell type_sub':'cell_type_sub'})

sc.pl.violin(subset[subset.obs.cell_type_sub == 'AT2'], 'ETV5', groupby='condition')
plt.savefig(dir+"07-intergration_condition_ETV5.png")

结果图如下:

image-20221007012950572.png

查看基因ETV5在两组中的表达显著性:使用mannwhitneyu进行检验

from scipy import stats
temp = subset[subset.obs.cell_type_sub == 'AT2']
i = np.where(temp.var_names == 'ETV5')[0][0]

a = temp[temp.obs.condition == 'COVID19'].X[:,i].todense()
b = temp[temp.obs.condition == 'Control'].X[:,i].todense()

stats.mannwhitneyu(a, b)

结果如下:MannwhitneyuResult(statistic=array([[17353149.]]), pvalue=array([[2.10908943e-171]]))

对DATP signature打分:Fig 3g

首先从文献的补充材料table S4的找到signature集合:our_DATP_sig,总共163个基因

image-20221008000439213.png

另存为:datp_sig.txt,不要表头

这个基因集是什么来头呢,文中描述如下:

Recent studies have shown that inflammation can induce a cell state that is characterized by failure to fully transition to AT1 cells; this has been termed ‘damage-associated transient progenitors’ (DATPs), ‘alveolar differentiation intermediate’ (ADI), or ‘pre-AT1 transitional cell state’ (PATS)25–27 (hereafter referred to as DATPs)

现在想看DATPs这个基因集在AT1/AT2中不同分组COVID-19和Control的打分情况,是否支持:incomplete transition of AT2 to AT1 cells in COVID-19 lungs(在肺再生过程中,AT2细胞作为AT1细胞的祖细胞)与 除了病毒感染直接破坏肺泡上皮外,COVID-19患者的肺再生过程也受到损害 。

#gene signature, ie, input list of genes from user
with open(dir + 'datp_sig.txt') as f:
    datp_sig = [x.strip() for x in list(f)]
    
sc.tl.score_genes(subset, datp_sig, score_name = 'datp')
subset.obs

现在可以看到数据中多了一列datp的打分:

image-20221008002007250.png

打分可视化:

sc.pl.violin(subset, 'datp', groupby='condition')
plt.savefig(dir+"08-intergration_datp_sig.png")

结果如下:

image-20221008002109711.png

显著性检验:

a = subset[subset.obs.condition == 'COVID19'].obs.datp.values
b = subset[subset.obs.condition == 'Control'].obs.datp.values
stats.mannwhitneyu(a, b)

# 结果
# MannwhitneyuResult(statistic=77312171.0, pvalue=0.0)

# 将打分UMAP可视化
sc.pl.umap(subset, color = 'datp', vmax = 0.5)
plt.savefig(dir+"09-intergration_datp_sig_umap.png")

结果图如下:

image-20221008002517251.png

本次国庆特刊单细胞python图文复现到此更新完毕,关于结果解读如为什么选择绘制基因ETV5、CAV1在AT1/AT2不同分组COVID19 vs Control中的表达等等,视频中对文献结果的解读更精彩,可前往观看~

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容