我突然发现我们的超算上有大名顶顶的ENCODE数据库ATAC-seq的分析流程,对于个人而言也可以轻松安装并进行调用。用户只要提供一个样本信息的json文件,按照他的示例进行排列就行了。
另外参考的背景知识文献
shift bam 问题
mcs2 参数
mcs2 home pages
ATAC-Seq 分析流程
ATAC-seq 片段长度周期性分布图
Question: ATAC-seq peak calling with MACS
自己call peaks使用如下参数:
macs2 callpeak -t input.bam -f BAMPE -g "mm" -p 0.01 -n name --nomodel --shift -37 --extsize 73 -B --SPMR --keep-dup all --call-summits
Here are some examples for combining --shift and --extsize:
To find enriched cutting sites such as some DNAse-Seq datasets. In this case, all 5' ends of sequenced reads should be extended in both directions to smooth the pileup signals. If the wanted smoothing window is 200bps, then use --nomodel --shift -100 --extsize 200.
For certain nucleosome-seq data, we need to pile up the centers of nucleosomes using a half-nucleosome size for wavelet analysis (e.g. NPS algorithm). Since the DNA wrapped on nucleosome is about 147bps, this option can be used: --nomodel --shift 37 --extsize 73.
下面是我自己的json文件批量产生的代码:
json_dir=/your_dir/jsons
fastq_dir=/your_dir/atac_seq/fastq
cd ${fastq_dir}
for i in $(ls $pwd *.fastq.gz | sed s/_[12].fastq.gz// | sort -u);do
json_file="${json_dir}/${i}.json.1.9.0"
out_dir=${json_dir}/${i}
mkdir -p $out_dir
echo "{
\"atac.pipeline_type\" : \"atac\",
\"atac.genome_tsv\" : \"/fdb/encode-atac-seq-pipeline/v3/mm10/mm10.tsv\",
\"atac.mito_chr_name\": \"chrM\",
\"atac.fastqs_rep1_R1\" :
[\"${fastq_dir}/${i}_1.fastq.gz\"],
\"atac.fastqs_rep1_R2\" :
[\"${fastq_dir}/${i}_2.fastq.gz\"],
\"atac.paired_end\" : true,
\"atac.multimapping\" : 4,
\"atac.auto_detect_adapter\" : true,
\"atac.smooth_win\" : 73,
\"atac.enable_idr\" : true,
\"atac.idr_thresh\" : 0.05,
\"atac.title\" : \"${i}\",
\"atac.description\" : \"ATAC-seq on Mus musculus C57BL/6 \"
}
"> $json_file
mv $json_file $out_dir
done
然后批量递交就好了
#!/bin/bash
#SBATCH --job-name=sbatch.atac_seq.job
#SBATCH --time=1:00:00
#SBATCH --cpus-per-task=2
#SBATCH --mem=2g
dir=/your_dir/atac_seq
json_dir=${dir}/scripts/jsons
out_dir=${dir}/results
job_dir=${dir}/jobs
log_dir=${dir}/logs
mkdir -p ${job_dir}
mkdir -p ${log_dir}
cd ${json_dir}
threads=5
for i in $(ls -d $pwd *);do
job_file="${job_dir}/${i}.atac_seq.job"
echo "#!/bin/bash
#SBATCH --job-name=${i}.atac_seq.job
#SBATCH --output=$log_dir/${i}.out
#SBATCH --time=48:00:00
#SBATCH --cpus-per-task=${threads}
#SBATCH --mem=20g
module load encode-atac-seq-pipeline/1.9.0
cd ${json_dir}/${i}
caper run \$EASP_WDL -i ${i}.json.1.9.0
rc=\$?
final_dir=${out_dir}/${i}
mkdir -p \${final_dir}
croo --method copy --out-dir=\${final_dir} \
${json_dir}/${i}/atac/*/metadata.json
exit \$rc
" > $job_file
sbatch $job_file
done
image.png