我们之前在python中1:1复刻R版火山图之后(【1:1复刻R版】python版火山图函数一键出图),群里小伙伴希望使用python复刻之前cell文章的多组差异基因火山图(单细胞转录组多组差异基因分析及可视化函数有(更新)),此前我们在R中也是将整个过程包装为一个函数,方便使用出图。此次在python中也是将其分装为一个函数,100%复刻还原,方便使用,只需要提供差异分析列表即可。顺便也演示了python中多组差异基因分析。
image.png
作图使用的是matplotlib结合seaborn,它两的组合我愿称之为python界的ggplot2!首先看看函数参数:我们需要提供的数据是一个dataframe,包含差异基因,logFC,pval的数据框。
image.png
导入库:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.figure import Figure
import adjustText as at
假设你的数据是seurat,,分析差异基因获取数据:
setwd('D:\\KS项目\\公众号文章\\python多组差异基因火山图函数')
library(Seurat)
#================================================================================
#DEGs analysis
cluster <- as.character(unique(Idents(uterus)))
DEGs <- list()
for (i in 1:length(cluster)) {
sce = subset(uterus, cell_type==cluster[i])
dges <- FindMarkers(sce, min.pct = 0.25,
logfc.threshold = 0.25,
group.by = "orig.ident",
ident.1 ="EEC",
ident.2="HC")
dges = as_tibble(cbind(gene = rownames(dges), dges))
dges$celltype <- cluster[i]
DEGs[[i]] <- dges
names(DEGs)[i] <- paste0(cluster[i],"_DEGs")
}
#================================================================================
#save result
diff <- do.call(rbind, DEGs)
write.csv(diff,file = "diff.csv")
#================================================================================
假设你的数据是scanpy,,分析差异基因获取数据:
import scanpy as sc
adata = sc.read_h5ad('/home/data_analysis/miloData.h5ad')
adata
all_results = [] # 用来收集不同 celltype 的 DEG 结果
for ct in adata.obs["celltype"].unique():
print(f"analysis {ct} ...")
adata_ct = adata[adata.obs["celltype"] == ct]
#degs
sc.tl.rank_genes_groups(
adata_ct,
groupby="group",
reference="healthy",
method="wilcoxon",
)
result = adata_ct.uns["rank_genes_groups"]
groups = result["names"].dtype.names # group names
for grp in groups:
df = pd.DataFrame({
"gene": result["names"][grp],
"pvals": result["pvals"][grp],
"pvals_adj": result["pvals_adj"][grp],
"logfoldchanges": result["logfoldchanges"][grp],
"scores": result["scores"][grp]
})
df["celltype"] = ct
df["group"] = grp
all_results.append(df)
#合并所有结果
degs = pd.concat(all_results, ignore_index=True)
最终,函数需要的数据如下:
diff = pd.read_csv('./diff.csv',index_col=0)
diff
image.png
plot,默认不标注基因:
ks_multi_volcano(data=diff,
logFC_index="avg_log2FC",
pval_index="p_val_adj",
group_index = 'celltype')
image.png
plot,标注感兴趣的基因:
def top_genes(group_df, n):
up = group_df[(group_df['avg_log2FC'] > 1.2) & (group_df['p_val_adj'] <0.05)].nlargest(n, 'avg_log2FC')
down = group_df[(group_df['avg_log2FC'] < -1.2) & (group_df['p_val_adj'] <0.05)].nsmallest(n, 'avg_log2FC')
return pd.concat([up, down])
# 对每个分组应用
res = diff.groupby('celltype', group_keys=False).apply(top_genes, n=1);
res
ks_multi_volcano(data=diff,
logFC_index="avg_log2FC",
pval_index="p_val_adj",
group_index = 'celltype',
lable = True,
gene_index = 'gene',
lable_genes = res['gene'])
image.png
完美复刻,觉得我们分享有些用的,点个赞再走呗!