软件15 —— exomePeak2

一、基本介绍

exomePeak2使用bam文件进行peak calling以及peak统计，它整合了meRIP-seq数据分析的常规分析内容：

使用scanMeripBAM检查BAM的index。
使用exomePeakCalling识别外显子区域的被修饰的peaks。
normalizeGC计算GC偏倚。
glmM或glmDM构建线性模型来计算差异位点。
exportResults输出peak结果。

exomePeak2识别RNA修饰峰和差异峰，并从MeRIP-seq实验的BAM文件中计算峰统计量。应提供transcript annotation（来自TxDb对象或GFF文件），以对外显子进行分析。需要基因组名称或BSgenome对象来执行GC含量偏差校正。如果未提供基因组参数，分析将在不进行GC校正的情况下进行。如果在参数bam_ip_treated和bam_input_treated处提供了处理样本的BAM文件，则将报告峰/位点上差异修饰检测的统计信息。

在默认设置下，exomePeak2会将（差异）修饰分析的结果保存在名为“exomePeak2_output”的文件夹下。生成的结果包括一个BED文件、一个RDS文件和一个CSV表，该表存储了（差异）修饰的峰/位点的位置和统计数据。

二、使用方法

(1) 使用方法

# 安装
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install("exomePeak2")

# 载入
library(exomePeak2)

# 注释文件
gtf = "annotation/genome/Drosophila_melanogaster.BDGP6.32.109.gtf"

# input样本
input_bam <- c("06_markdup/LF4.Input.markdup.bam",
               "06_markdup/LF5.Input.markdup.bam")

# ip样本
ip_bam <- c("06_markdup/LF4.IP.markdup.bam",
            "06_markdup/LF5.IP.markdup.bam")

# 变量名
name = c("LF4", "LF5")

# 批量callpeak
lapply(1:2, function(x){
  dir.create(paste('11_exomepeak/', name[x], sep = ''))
  exomePeak2(gff = gtf,
             bam_ip = ip_bam[x],
             bam_input = input_bam[x],
             save_dir = "11_exomepeak/",
             experiment_name = name[x],
             strandness = "1st_strand",
             parallel = 4,
             p_cutoff = 1e-10,
             fragment_length = 150)
}) -> result

# 两组比较*
exomePeak2(gff = gtf,
           bam_ip = ip_bam[5],
           bam_input = input_bam[5],
           bam_ip_treated = ip_bam[1],
           bam_input_treated = input_bam[1],
           save_dir = "11_exomepeak/",
           experiment_name = name[1], 
           strandness = "1st_strand",
           parallel = 4,
           p_cutoff = 1e-10,
           fragment_length = 150)

(2) exomePeak2参数

exomePeak2(
  bam_ip = NULL,  # IP样本的BAM文件目录的字符向量
  bam_input = NULL,  # input样本的BAM文件目录的字符向量
  bam_ip_treated = NULL,  # treated IP样本的BAM文件目录的字符向量，仅在差异甲基化分析中需要
  bam_input_treated = NULL,  # treated input样本的BAM文件目录的字符向量，仅在差异甲基化分析中需要
  txdb = NULL,  #转录本注释TxDb object
  genome = NULL,  #参考基因组BSgenome
  gff = NULL,  #指定基因注释GFF/GTF文件的目录，当TxDb对象不可用时应用该项
  strandness = c("unstrand", "1st_strand", "2nd_strand"),
  fragment_length = 100,  #a positive integer number for the expected fragment length (in bp); default = 100.
  bin_size = 25,
  step_size = 25,
  test_method = c("Poisson", "DESeq2"),
  p_cutoff = 1e-10,  #峰识别中的p值。Note that when using the test method of DESeq2, a larger p-value cut-off (e.g. 0.001) is often required.
  parallel = 1,
  plot_gc = TRUE,  #saving the plots of bins' GC content v.s. bins' fitted coverage curves
  save_output = TRUE,
  save_dir = getwd(),
  experiment_name = "exomePeak2_output",  #在包含所有结果的输出目录中生成的文件夹名称的字符
  mode = c("exon", "full_transcript", "whole_genome"),  #a character specifies the scope of peak calling on genome.
  motif_based = FALSE,  #基于motif位点检测修饰，如果设置为T，滑动窗口将被替换为single based sites of the modification motif
  motif_sequence = "DRACH"  #a character for the motif sequence used for the reference sites
)

(3) 生物学重复取交集

生物学重复取交集，同时过滤掉overlap小于50%的peaks（使用作者提供的一个python2小脚本）。

$ vi exomePeak2_intersect.py
#to filter intersect with proportion less then 0.5
from __future__ import division
import sys
for line in open(sys.argv[1],"r") :
        line = line.strip()
        info = line.split("\t")
        a_len = sum([int(i) for i in info[10].split(",")])
        b_len = sum([int(i) for i in info[22].split(",")])
        o_len = int(info[24])
        if (int(info[1])-int(info[13]))*(int(info[2])-int(info[14])) <= 0 :
                print line
        else:
                if (o_len/a_len) >= 0.5 or (o_len/b_len) >= 0.5 :
                        print line

$ for i in 'WTF' 'WTM' ; do (bedtools intersect  -a 11_exomepeak/$i\4/peaks.bed  -b 11_exomepeak/$i\5/peaks.bed  -s  -split  -wo > tmp_file  >  11_intersect/$i\_tmp_file) ; done

$ python shellscript/exomePeak2_intersect.py 11_intersect/WTF_tmp_file | cut -f 1,2,3,4,5,6,7,8,9,10,11,12 | sort | uniq > 11_intersect/WTF_intersect.bed

$ rm 11_intersect/*_tmp_file
$ wc -l 11_intersect/*

软件15 —— exomePeak2

软件15 —— exomePeak2

一、基本介绍

二、使用方法

(1) 使用方法

(2) exomePeak2参数

(3) 生物学重复取交集

推荐阅读更多精彩内容

友情链接更多精彩内容

软件15 —— exomePeak2

一、 基本介绍

二、 使用方法

(1) 使用方法

(2) exomePeak2参数

(3) 生物学重复取交集

推荐阅读更多精彩内容

友情链接更多精彩内容

一、基本介绍

二、使用方法