样本关系分析
样本聚类分析
探索样本之间的关系,锁定关键的样本
差异表达基因探索
维恩图、火山图、热图
-差异参考数值
fold change(表达量差异)/p-value
fold change相较于来说要更重要一些
R语言操作
导入数据
-1.先在linux下合成数据框汇总表,所有的差异表达需要数据框从R种用代码导入
参考代码
-2.在R种循环导入
deg_files = list.files(path = "目录"
pattern = "文件的特殊结构"
full.names = T
)
-3.读取文件
read.table(file = deg_files[1] %>%
rownames_to_column(var = "gene.id")
读取第二个文件
d = read.table(file = deg_files[2] %>%
rownames_to_column(var = "gene.id")
deg_result = bind_rows(deg_result,d)
循环读取
for (i in 2:length(deg_files)) {
d = read.table(file = deg_files[i] ) %>%
rownames_to_column(var = "gene.id")
deg_result = bind_rows(deg_result,d)
}
差异表达数据预处理
library(tidyverse)
select(deg_result,gene_id,sampleA,sampleB.....) %>% # 保留自己需要研究的列
mutate(direction = if_else(abs(log2FoldChange)<1 | padj > 0.05, 'NS',
if_else(logFoldChange >= 1, 'UP', 'Down'))) -> deg_result # 添加 direction 列
filter(deg_result,direction != 'NS') %>% # 统计基因的数目及分组
group_by(sampleA, direction) %>%
summarise(count = n())
画差异基因的维恩图
-提取差异基因的ID
参考代码
library(vennDiagram)
venn.diagram(
x = deg_list
filename = "deg_venn.tiff"
fill = my_color
)
可以去调色板找
library(RColorBrewer)
my_color = RColorBrewer::brewer_pal(4, name = "Set1")
火山图
提取俩个样本之间的差异程度用作画火山图
参考
key_gene = "xxx","xxx" # 显示需要的目的基因
library(EnhancedVolcano)
EnhancedVolcano(
toptable = my_deg_result,
lab = my_deg_result$gene_id,
selecttlab = key_gene,
x = "log2FoldChange",
y = "padj",
FCcutor = 2,
pCutoff = 0.01
热图
筛选重要基因
¥¥¥¥
key_gene = slice((my_deg_result, 1:20) %>% pull(gene_id)
key_gene_exp = gene_exp[key_gene,]
热图
library(pheatmap)
pheatmap(key_gene_exp,scale = "row")
-取对数可以减低数据的差距
-标准化降低数据的差距
-对表达量进行聚类