DAP-seq workflow

环境及软件

fastpbowtie2samtoolschipseeker 都是需要在服务器上安装的软件。
fastp version 0.23.4 Bowtie 2 version 2.5.4 samtools Version: 1.20
conda create -n Chipseq python=3.8.10 rbase=4.1.0
conda install -c bioconda bowtie2
conda install -c bioconda samtools
启动环境
conda activate Chipseq

使用fastp对rawdata进行质控、过滤及统计

单个样本处理
mkdir -p trimmed_reads

fastp -i "sample1/Sample1_R1.fastq.gz" \
-I "sample1/Sample1_R2.fastq.gz" \
-o "trimmed_reads/Sample1_trimmed_R1.fastq.gz" \
-O "trimmed_reads/Sample1_trimmed_R2.fastq.gz" \
--cut_front --cut_front_window_size 4 --cut_front_mean_quality 20 \
--cut_tail --cut_tail_window_size 4 --cut_tail_mean_quality 20 \
--cut_right --cut_right_window_size 4 --cut_right_mean_quality 20 \
--detect_adapter_for_pe -q 15 -u 40 -e 20 -n 5 -p -P 20 -w 4 \
-R "Sample1" --json trimmed_reads/Sample1_fastp.json \
--html trimmed_reads/Sample1_fastp.html

fastp -i "sample2/Sample2_R1.fastq.gz" \
-I "sample2/Sample2_R2.fastq.gz" \
-o "trimmed_reads/Sample2_trimmed_R1.fastq.gz" \
-O "trimmed_reads/Sample2_trimmed_R2.fastq.gz" \
--cut_front --cut_front_window_size 4 --cut_front_mean_quality 20 \
--cut_tail --cut_tail_window_size 4 --cut_tail_mean_quality 20 \
--cut_right --cut_right_window_size 4 --cut_right_mean_quality 20 \
--detect_adapter_for_pe -q 15 -u 40 -e 20 -n 5 -p -P 20 -w 4 \
-R "Sample2" --json trimmed_reads/Sample2_fastp.json \
--html trimmed_reads/Sample2_fastp.html

fastp -i "input/Input_R1.fastq.gz" \
-I "input/Input_R2.fastq.gz" \
-o "trimmed_reads/Input_trimmed_R1.fastq.gz" \
-O "trimmed_reads/Input_trimmed_R2.fastq.gz" \
--cut_front --cut_front_window_size 4 --cut_front_mean_quality 20 \
--cut_tail --cut_tail_window_size 4 --cut_tail_mean_quality 20 \
--cut_right --cut_right_window_size 4 --cut_right_mean_quality 20 \
--detect_adapter_for_pe -q 15 -u 40 -e 20 -n 5 -p -P 20 -w 4 \
-R "Input" --json trimmed_reads/Input_fastp.json \
--html trimmed_reads/Input_fastp.html
循环处理
创建输出目录
mkdir -p trimmed_reads

定义样本名称数组
samples=("Sample1" "Sample2" "Input")

循环处理每个样本
for sample in "${samples[@]}"
do
fastp -i "${sample}/${sample}_R1.fastq.gz" \
-I "${sample}/${sample}_R2.fastq.gz" \
-o "trimmed_reads/${sample}_trimmed_R1.fastq.gz" \
-O "trimmed_reads/${sample}_trimmed_R2.fastq.gz" \
--cut_front --cut_front_window_size 4 --cut_front_mean_quality 20 \
--cut_tail --cut_tail_window_size 4 --cut_tail_mean_quality 20 \
--cut_right --cut_right_window_size 4 --cut_right_mean_quality 20 \
--detect_adapter_for_pe -q 15 -u 40 -e 20 -n 5 -p -P 20 -w 4 \
-R "${sample}" --json "trimmed_reads/${sample}_fastp.json" \
--html "trimmed_reads/${sample}_fastp.html"
done
使用脚本:fastp.sh
后台提交:nohup ./fastp.sh > fastp.log 2>&1 &
查看输出:tail -f fastp.log
查看任务状态:top
结果解读
输出文件:Input_trimmed_R1.fastq.gz
Input_trimmed_R2.fastq.gz
Sample1_trimmed_R1.fastq.gz
Sample1_trimmed_R2.fastq.gz
Sample2_trimmed_R1.fastq.gz
Sample2_trimmed_R2.fastq.gz

数据比对

完成数据过滤后获得clean data, 需将clean data比对到参考基因组上用于后续分析。
比对需要用到bowtie2软件。

2.1准备参考基因组

Cula_genome.fasta Cula.gtf

2.2使用bowtie2构建索引

脚本: build_index.sh
cat build_index.sh
创建索引文件输出目录;mkdir -p ./cula_genome

```if [ ! -f ./Cula_genome.fasta ]; then```
```  echo "Cula_genome.fasta not found in the current directory. Please make sure the file exists."```
  ```exit 1```
```fi```
```将Cula_genome.fasta文件从当前目录复制到cula_genome目录中```
```cp ./Cula_genome.fasta ./cula_genome/```
```cd ./cula_genome```
```bowtie2-build --threads 5 ./Cula_genome.fasta ./cula_genome```
```echo "Index building completed successfully."```
这步骤完成之后会在cula_genome/目录下生成六个文件,如下:
```cula_genome.1.bt2l```
```cula_genome.2.bt2l```
```cula_genome.3.bt2l```
```cula_genome.4.bt2l```
```cula_genome.rev.1.bt2l```
```cula_genome.rev.2.bt2l```
##2.3将过滤后的cleanreads比对到参考基因组上
脚本:run_bowtie2.sh
```THREADS=6```
```INDEX_PATH="/share/home/stu_yifan/chipseq/ref/cula_genome/cula_genome"```

```echo "线程数: $THREADS"```
```echo "索引路径: $INDEX_PATH"```

```echo "开始处理 Sample 1"```
```READ1="/share/home/stu_yifan/chipseq/rawdata/trimmed_reads/Input_trimmed_R1.fastq.gz"```
```READ2="/share/home/stu_yifan/chipseq/rawdata/trimmed_reads/Input_trimmed_R2.fastq.gz"```
```OUTPUT="/share/home/stu_yifan/chipseq/rawdata/trimmed_reads/Input_aligned.sam"```
```echo "READ1: $READ1"```
```echo "READ2: $READ2"```
```echo "OUTPUT: $OUTPUT"```
```bowtie2 -p $THREADS -q -x $INDEX_PATH -1 $READ1 -2 $READ2 -S $OUTPUT --local```
```if [ $? -ne 0 ]; then```
```  echo "Sample 1 处理失败"```
```  exit 1```
```fi```
```echo "Sample 1 处理完成"```

```echo "开始处理 Sample 2"```
```READ1="/share/home/stu_yifan/chipseq/rawdata/trimmed_reads/Sample1_trimmed_R1.fastq.gz"```
```READ2="/share/home/stu_yifan/chipseq/rawdata/trimmed_reads/Sample1_trimmed_R2.fastq.gz"```
```OUTPUT="/share/home/stu_yifan/chipseq/rawdata/trimmed_reads/Sample1_aligned.sam"```
```echo "READ1: $READ1"```
```echo "READ2: $READ2"```
```echo "OUTPUT: $OUTPUT"```
```bowtie2 -p $THREADS -q -x $INDEX_PATH -1 $READ1 -2 $READ2 -S $OUTPUT --local```
```if [ $? -ne 0 ]; then```
  ```echo "Sample 2 处理失败"```
```  exit 1```
```fi```
```echo "Sample 2 处理完成"```

```echo "开始处理 Sample 3"```
```READ1="/share/home/stu_yifan/chipseq/rawdata/trimmed_reads/Sample2_trimmed_R1.fastq.gz"```
```READ2="/share/home/stu_yifan/chipseq/rawdata/trimmed_reads/Sample2_trimmed_R2.fastq.gz"```
```OUTPUT="/share/home/stu_yifan/chipseq/rawdata/trimmed_reads/Sample2_aligned.sam"```
```echo "READ1: $READ1"```
```echo "READ2: $READ2"```
```echo "OUTPUT: $OUTPUT"```
```bowtie2 -p $THREADS -q -x $INDEX_PATH -1 $READ1 -2 $READ2 -S $OUTPUT --local```
``if [ $? -ne 0 ]; then```
```  echo "Sample 3 处理失败"```
```  exit 1```
```fi```
```echo "Sample 3 处理完成"```
##2.4对sam文件进行排序并转换为bam文件
```对sam文件排序并输出为bam文件```
```samtools sort sample.sam -o sample.sorted.bam```
# calling peaks
测序数据比对完成后,要进行peaks分析,使用MACS2软件
```conda install -c bioconda macs2```
```macs2 2.2.9.1```
run_macs2.sh脚本的目的是使用MACS2软件对ChIP-seq数据进行峰值调用(peak calling),以识别蛋白质-DNA相互作用的特征区域。
##1. 设置输入文件和输出目录
```TREATED_BAMS=("/share/home/stu_yifan/chipseq/rawdata/sorted_bam/Sample1_sorted.bam" "/share/home/stu_yifan/chipseq/rawdata/sorted_bam/Sample2_sorted.bam")```
```CONTROL_BAM="/share/home/stu_yifan/chipseq/rawdata/sorted_bam/Input_sorted.bam"```
```OUTPUT_DIR="/share/home/stu_yifan/chipseq/rawdata/sorted_bam/macs2_output"```
(1)TREATED_BAMS:这是一个数组,包含了多个处理组的BAM文件路径,这些文件是ChIP-seq数据的排序后(sorted)的BAM格式文件。每个文件对应一个样本(例如Sample1、Sample2)
(2)CONTROL_BAM:这是输入对照组的BAM文件路径。对照组(Input)通常是ChIP-seq实验中没有进行抗体富集处理的样本,用于作为背景对照。
(3)OUTPUT_DIR:指定了MACS2峰值调用结果的输出目录路径,所有的输出文件将保存在这个目录下。
##2.创建输出目录(如果不存在)
```mkdir -p $OUTPUT_DIR```

##3. 运行MACS2对每个样本进行峰值调用
```for TREATED_BAM in "${TREATED_BAMS[@]}"```
```do```
```  SAMPLE_NAME=$(basename $TREATED_BAM _sorted.bam)```
```  macs2 callpeak -t $TREATED_BAM -c $CONTROL_BAM -f BAMPE -g hs -n $SAMPLE_NAME --outdir $OUTPUT_DIR```
```done```
for 循环:这个循环遍历数组TREATED_BAMS中的每一个元素,即每一个处理组的BAM文件。
SAMPLE_NAME=$(basename $TREATED_BAM _sorted.bam):
basename命令用于从文件路径中提取文件名,并去掉指定的后缀_sorted.bam。这将生成一个短的样本名称(例如Sample1),用于作为输出文件的前缀。
macs2 callpeak -t $TREATED_BAM -c $CONTROL_BAM -f BAMPE -g hs -n $SAMPLE_NAME --outdir $OUTPUT_DIR:
macs2 callpeak:这是调用MACS2进行峰值调用的命令。
-t $TREATED_BAM:指定输入的处理组BAM文件。
-c $CONTROL_BAM:指定输入的对照组BAM文件。
-f BAMPE:指定输入文件的格式为BAM配对末端(paired-end reads)。
-g hs:指定基因组大小为人类(hs表示homo sapiens),MACS2需要这个参数来计算统计显著性。
-n $SAMPLE_NAME:指定输出文件的前缀为样本名称(例如Sample1)。
--outdir $OUTPUT_DIR:指定输出文件保存的目录。
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容