S3norm -- 表观遗传数据标准化工具

序言:
S3norm使用单调非线性变换对数据进行标准化，模拟标准化数据集之间的测序深度和信号背景的数值，可以更好的反应表观遗传数据之间的生物学差异。

S3norm发表的文章 https://academic.oup.com/nar/article/48/8/e43/5747479

1 下载
1.1 创建环境

conda create -n S3norm  python=2.7

注意，这里使用的python版本仍然是2.7

conda activate S3norm

1.2 Clone the github repository

git clone https://github.com/guanjue/S3norm.git

1.3 下载依赖

pip install numpy scipy

2 准备输入文件
2.1 将已排序的无重复的bam文件转为bedgraph

bamCoverage --bam input.bam  -o ./out.bed -of bedgraph -bs 10 -p 8  --minMappingQuality 30  -e  150

2.2 对得到的bed文件进行排序

sort -k1,1 -k2,2n ./out.bed  > ./bed_sorted

2.3 对多个样本不同区间进行合并(该命令详细解释，https://www.jianshu.com/p/f8bbd51b5199)

bedtools unionbedg -i 1-ATAC_0G.bed_sorted 2-ATAC_50G.bed_sorted 3-ATAC_80G.bed_sorted > ATAC_3_samples

2.4 将合并区域的文件拆分

cut -f 1,2,3,4 ATAC_3_samples > 1-ATAC_0G.input
cut -f 1,2,3,5 ATAC_3_samples > 2-ATAC_50G.input
cut -f 1,2,3,6 ATAC_3_samples > 3-ATAC_80G.input

2.5 生成标准文件(根据个人情况选择不同的标准文件)

awk '{print $1"\t"$2"\t"$3"\t""1"}' ATAC_3_samples > control_s3norm_input

2.6 生成标准化文件的列表，本文中的列表(命名为file_list.txt)应该如下

1-ATAC_0G.input  control_s3norm_input
2-ATAC_50G.input control_s3norm_input
3-ATAC_80G.input  control_s3norm_input

take a look input files

head *input

head 1-ATAC_0G.input
chr1    7000    7200    0
chr1    18800   19000   0
chr1    62400   62600   5.02
chr1    63800   64000   188.21
chr1    95600   95800   16.41
chr1    136000  136200  0
chr1    156000  156200  0
chr1    158800  159000  0
chr1    206400  206600  51.87
chr1    217000  217200  0

head 2-ATAC_50G.input
chr1    7000    7200    0
chr1    18800   19000   0
chr1    62400   62600   0
chr1    63800   64000   2.66
chr1    95600   95800   0
chr1    136000  136200  50.26
chr1    156000  156200  0
chr1    158800  159000  0
chr1    206400  206600  0
chr1    217000  217200  0

head 3-ATAC_80G.input
chr1    7000    7200    0
chr1    18800   19000   0
chr1    62400   62600   0
chr1    63800   64000   0
chr1    95600   95800   0
chr1    136000  136200  0
chr1    156000  156200  0
chr1    158800  159000  0
chr1    206400  206600  0
chr1    217000  217200  0

head control_s3norm_input
chr1    7000    7200    1
chr1    18800   19000   1
chr1    62400   62600   1
chr1    63800   64000   1
chr1    95600   95800   1
chr1    136000  136200  1
chr1    156000  156200  1
chr1    158800  159000  1
chr1    206400  206600  1
chr1    217000  217200  1

运行S3norm
3.1 在输入文件所在路径下写入运行脚本

### S3norm code所在路径
script_directory='/where_user_clone_the_S3norm_GitHub/S3norm/'
### 输入文件所在路径
working_directory='./example_file/'
### 执行 S3norm
time python $script_directory'/src/s3norm_pipeline.py' -s $script_directory'/src/' -t ./file_list.txt

3.2 查看输出文件
三种类型的 S3norm 输出文件

(1) S3norm标准化后的read counts (normalized read counts). (存储于 'S3norm_rc_bedgraph/')
(2) 基于负二项分布 -log10 p-value 标准化后的read counts. (存储于 'NBP_bedgraph/')
原文: The negative log10 p-value of S3norm normalized read counts based on a negative binomial background model.
(3) 基于负二项分布对 -log10 p-value 进行标准化. (Saved in 'S3norm_NBP_bedgraph/')
原文: The S3norm normalized negative log10 p-value based on a negative binomial background model.

对标准化后的read counts进行可视化处理

4.1 对S3norm_rc_bedgraph/下的文件进行排序

sort -k1,1 -k2,2n 1-ATAC_0G.bedgraph.s3norm.bedgraph > 1-ATAC_0G.bedgraph.s3norm.bedgraph_sorted

4.2 bed文件转为bigwig文件
需要提前下载bedGraphToBigWig软件以及基因组大小文件

~/tools/bedgraphtobigwig/bin/bedGraphToBigWig  1-ATAC_0G.bedgraph.s3norm.bedgraph_sorted ~/genome/tair.sizes.genome 1-ATAC_0G.bigwig

后续
S3norm的输出文件还有多种用途，这里只是针对标准化后的可视化进行了展示.
欢迎大家交流指正.

S3norm -- 表观遗传数据标准化工具

推荐阅读更多精彩内容