hbctraining-Introduction to ChIP-Seq Lesson 3

Align and Filtering


Part 1. Align

1.Alignment to Genome

    After we have assessed the clean sequence data, we are ready to align the reads to the reference genome. Bowtie2 is a fast and accurate alignment tools that indexes the genome with an FM index based on the Burrows-Wheeler transform method to keep memory requirements low for the alignment process. Bowtie2 supports gapped, local and paired end alignment modes and works best for reads that are at least 50bp (shorter read lengths should use Bowtie1, like smRNA-Seq). By default, Bowtie2 will perform a global end-to-end read alignment, which is best for quality-trimmed reads. However, it also has a local alignment mode, which will perform soft-clipping for the removal of poor quality bases or adapters from untrimmed reads

2. Bowtie2 Usage

* Creating a Bowtie2 index

        Genome index , analagous to the index in the back of a book,is required to perform          the  Bowtie2 alignment. We can generate the genome index by the following command:

        bowtie2-build<path_to_reference_genome.fa><prefix_to_name_indexes>

* often-used parameters in Bowtie2

     -p: number of processors/cores

     -q: reads that are in FASTQ format

     --local: local alignment feature to perform soft-clipping

     -x: /path/to/genome_index_directory

     -S: /path/to/output/SAM_file

     -U: Single-end data

     -1/-2: Pair-end data

3. Alignment file format: SAM/BAM

to be continued


Part 2. Filtering

An important issue with ChIP-Seq data concerns the inclusion of multiple mapped reads (reads mapped to multiple loci on the reference genome). Allowing for multiple mapped reads increases the number of usable reads and sensitivity of peak detection; however, the number of false positives may also increase[1]. Therefore we need to filter out alignment files to contain only uniquely mapping reads in order to increase confidence in site discovery and improve reproducibility. Since there is no parameter in Bowtie2 to keep only uniquely mapping reads, we will need to perform the following steps to generate alignment files containing only the uniquely mapping reads:

1. Change alignment file format from SAM to BAM by samtools view

parameters included in this step:

-h: include header in output

-S: input is in SAM format

-b: output BAM format

-o: /path/to/output/file

2. Sort BAM file by read coordinate locations(sambamba sort or samtools sort)

the advantage to using sambamba is that along with the newly sorted file, an index file is generated. If we used samtools this would have been a two-step process.

3. Filter to keep only uniquely mapping reads(this will also remove any unmapped reads and duplicates)

We filter out multimappers by specifying XS:

XS:i:<N> Alignment score for the best-scoring alignment found other than the alignment reported. Can be negative. Can be greater than 0 in --local mode (but not in --end-to-end mode). Only present if the SAM record is for an aligned read and more than one alignment was found for the read. Note that, when the read is part of a concordantly-aligned pair, this score could be greater than AS:i

or We can filter by MAPQ.

* for sambamba

-t: number of threads / cores

-h: print SAM header before reads

-f: format of output file (default is SAM)

-F: set custom filter - we will be using the filter to remove duplicates, multimappers and unmapped reads.

sambamba view -h -t 2 -f bam -F "[XS] == null and not unmapped  and not duplicate" sorted.bam>sort.filter.bam

*for samtools

samtools view -Shub -f 2 -q 30 $sam | samtools sort - -T $path/$sample -o $filter_bam

TO BE CONTINUED


   

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi阅读 7,486评论 0 10
  • Introduction What is Bowtie 2? Bowtie 2 is an ultrafast a...
    wzz阅读 5,825评论 0 5
  • 也许,在我懵懂的时候我就渴望着看见在黑夜的天空中闪耀着的繁星,洒满天空的无数颗恒星与行星居住在宇宙中,它们闪耀着如...
    钤鱼摆摆阅读 340评论 1 1
  • 业精于勤荒于嬉,坚持就会胜利! 一如往常,准点准备。 大花脸开始了。
    Xiewy123阅读 276评论 0 0
  • 真的非常感恩圣灵。因为买的机票是早上10:50,昨天下午开始就一直非常担心是否赶得上航班,想到之前来到机场确实花了...
    yadie阅读 202评论 0 0