【1:1复刻Cell】python版单细胞多组差异基因火山图可视化函数

我们之前在python中1:1复刻R版火山图之后(【1:1复刻R版】python版火山图函数一键出图),群里小伙伴希望使用python复刻之前cell文章的多组差异基因火山图(单细胞转录组多组差异基因分析及可视化函数有(更新)),此前我们在R中也是将整个过程包装为一个函数,方便使用出图。此次在python中也是将其分装为一个函数,100%复刻还原,方便使用,只需要提供差异分析列表即可。顺便也演示了python中多组差异基因分析。

image.png

作图使用的是matplotlib结合seaborn,它两的组合我愿称之为python界的ggplot2!首先看看函数参数:我们需要提供的数据是一个dataframe,包含差异基因,logFC,pval的数据框。


image.png

导入库:

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.figure import Figure
import adjustText as at

假设你的数据是seurat,,分析差异基因获取数据:

setwd('D:\\KS项目\\公众号文章\\python多组差异基因火山图函数')
library(Seurat)
#================================================================================
#DEGs analysis
 
cluster <- as.character(unique(Idents(uterus)))  
DEGs <- list()
 
for (i in 1:length(cluster)) {
 
  sce = subset(uterus, cell_type==cluster[i])
  dges <- FindMarkers(sce, min.pct = 0.25, 
                      logfc.threshold = 0.25,
                      group.by = "orig.ident",
                      ident.1 ="EEC",
                      ident.2="HC")
  dges = as_tibble(cbind(gene = rownames(dges), dges))
  dges$celltype <- cluster[i]
 
  DEGs[[i]] <- dges
  names(DEGs)[i] <- paste0(cluster[i],"_DEGs")
 
}
 
#================================================================================
#save result
 
diff <- do.call(rbind, DEGs)
write.csv(diff,file = "diff.csv")
 
#================================================================================

假设你的数据是scanpy,,分析差异基因获取数据:

import scanpy as sc
adata = sc.read_h5ad('/home/data_analysis/miloData.h5ad')
adata
all_results = []  # 用来收集不同 celltype 的 DEG 结果
 
for ct in adata.obs["celltype"].unique():
    print(f"analysis {ct} ...")
    adata_ct = adata[adata.obs["celltype"] == ct]
 
    #degs
    sc.tl.rank_genes_groups(
        adata_ct,
        groupby="group",
        reference="healthy", 
        method="wilcoxon",
 
    )
 
    result = adata_ct.uns["rank_genes_groups"]
    groups = result["names"].dtype.names  # group names
 
    for grp in groups: 
        df = pd.DataFrame({
            "gene": result["names"][grp],
            "pvals": result["pvals"][grp],
            "pvals_adj": result["pvals_adj"][grp],
            "logfoldchanges": result["logfoldchanges"][grp],
            "scores": result["scores"][grp]
        })
        df["celltype"] = ct
        df["group"] = grp
        all_results.append(df)
 
#合并所有结果
degs = pd.concat(all_results, ignore_index=True)

最终,函数需要的数据如下:

diff = pd.read_csv('./diff.csv',index_col=0)
diff
image.png

plot,默认不标注基因:

ks_multi_volcano(data=diff,
                 logFC_index="avg_log2FC",
                 pval_index="p_val_adj",
                 group_index = 'celltype')
image.png

plot,标注感兴趣的基因:

def top_genes(group_df, n):
    up = group_df[(group_df['avg_log2FC'] > 1.2) & (group_df['p_val_adj'] <0.05)].nlargest(n, 'avg_log2FC')
    down = group_df[(group_df['avg_log2FC'] < -1.2) & (group_df['p_val_adj'] <0.05)].nsmallest(n, 'avg_log2FC')
    return pd.concat([up, down])
# 对每个分组应用
res = diff.groupby('celltype', group_keys=False).apply(top_genes, n=1);
res
ks_multi_volcano(data=diff,
                 logFC_index="avg_log2FC",
                 pval_index="p_val_adj",
                 group_index = 'celltype',
                lable = True,
                gene_index = 'gene',
                lable_genes = res['gene'])
image.png

完美复刻,觉得我们分享有些用的,点个赞再走呗!

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容