一开始感觉要直接做三组平均,写了个peakaverage.sh 暂时没用上
#!/bin/bash
# Define input and output directories
INPUT_DIR="/home/data/t210424/ac4c/bam"
OUTPUT_DIR="/home/data/t210424/ac4c/peaksaverage"
# Create output directory if it doesn't exist
mkdir -p "$OUTPUT_DIR"
# Change to input directory
cd "$INPUT_DIR"
# Call peaks for Sham group
macs2 callpeak \
-t Sham2_IP.bam Sham3_IP.bam Sham4_IP.bam \
-c Sham_Input.bam \
-f BAMPE \
-g mm \
-n Sham_group \
--outdir "$OUTPUT_DIR" \
--broad \
--broad-cutoff 0.1 \
-q 0.01 \
--nomodel \
--extsize 200 \
--keep-dup all
# Call peaks for SNI group
macs2 callpeak \
-t SNI1_IP.bam SNI2_IP.bam SNI4_IP.bam \
-c SNI_Input.bam \
-f BAMPE \
-g mm \
-n SNI_group \
--outdir "$OUTPUT_DIR" \
--broad \
--broad-cutoff 0.1 \
-q 0.01 \
--nomodel \
--extsize 200 \
--keep-dup all
单个样本的callpeak其实和之前一样,不过文件名改过了干脆重新call了,脚本是peaksingle.sh
#!/usr/bin/env bash
set -euo pipefail
bam=/home/data/t210424/ac4c/bam
out=/home/data/t210424/ac4c/peaksingle
mkdir -p "$out"
declare -A IN=( ["Sham"]=Sham_Input.bam ["SNI"]=SNI_Input.bam )
for s in Sham2 Sham3 Sham4 SNI1 SNI2 SNI4; do
grp=${s%%[0-9]*} # 提取 Sham 或 SNI
macs2 callpeak \
-t "$bam/${s}_IP.bam" \
-c "$bam/${IN[$grp]}" \
-f BAM -g mm --nomodel --extsize 200 --keep-dup all \
-n "${s}" --outdir "$out"
done
对/home/data/t210424/ac4c/peaksingle里每一个bed进行合并
脚本是ac4c文件夹下的tongji1.sh
#!/usr/bin/env bash
set -euo pipefail
bam=/home/data/t210424/ac4c/bam
bed=/home/data/t210424/ac4c/peaksingle
out=/home/data/t210424/ac4c/peaksingle/bed
mkdir -p "$out"
# 1) union peak(按坐标并集)
cat $bed/*.narrowPeak \
| cut -f1-3 | sort -k1,1 -k2,2n | bedtools merge -i - \
> $out/union.bed
# 2) 统计每个 peak 在各样本的 reads(IP 与 Input 分开做)
bedtools multicov -bams \
$bam/Sham2_IP.bam $bam/Sham3_IP.bam $bam/Sham4_IP.bam \
$bam/SNI1_IP.bam $bam/SNI2_IP.bam $bam/SNI4_IP.bam \
-bed $out/union.bed \
> $out/counts_ip.txt
bedtools multicov -bams \
$bam/Sham_Input.bam $bam/Sham_Input.bam $bam/Sham_Input.bam \
$bam/SNI_Input.bam $bam/SNI_Input.bam $bam/SNI_Input.bam \
-bed $out/union.bed \
> $out/counts_input.txt
然后就转进R了 先做的DESeq2 双因素模型,SHAM对比SNI IP对比INPUT两种因素
library("ashr")
library(DESeq2)
library(tidyverse)
setwd("F:/data")
# 1. 导入 count 矩阵
counts <- read.table("all_counts.txt", header=TRUE, row.names=1, sep="\t")
# 2. 构建样本信息,并指定 factor 水平
# Sham 作为 condition 的 reference,Input 作为 antibody 的 reference
sample_info <- data.frame(
row.names = colnames(counts),
condition = factor(c("Sham","Sham","Sham","SNI","SNI","SNI","Sham","SNI"),
levels = c("Sham","SNI")),
antibody = factor(c("IP","IP","IP","IP","IP","IP","Input","Input"),
levels = c("Input","IP"))
)
# 3. 构建 DESeq2 对象
dds <- DESeqDataSetFromMatrix(countData = counts,
colData = sample_info,
design = ~ condition + antibody + condition:antibody)
# 4. 运行差异分析
dds <- DESeq(dds)
# 5. 提取交互项(Sham vs SNI 在 IP 相对于 Input 的差异)
res <- results(dds, name="conditionSNI.antibodyIP")
# 6. 收缩 log2FC(推荐使用 ashr)
res <- lfcShrink(dds, coef="conditionSNI.antibodyIP", res=res, type="ashr")
# 7. 输出结果
write.csv(as.data.frame(res), file="acRIP_diff_peaks.csv")
结果太少了 又换回单因素了,先用IP/INPUT再做,DESeq只接受整数所以换了limma