BOWTIE2 进行基因组比对

整理ChIP-seq / CUT & Tag 分析时用到的工具。本文只对使用的工具用法进行简单介绍。

Bowtie 2是常用的基因组比对软件。其原理在此不过多赘述,有兴趣的同学可以参阅其官方文档以及其发表的文章(https://doi.org/10.1038/nmeth.1923)。下面简单介绍Bowtie 2 Index和比对的命令及个人常用参数。

用法

Index

bowtie2-build [options]* <reference_in> <bt2_base>

<reference_in>:如果此处使用-f 参数,则指明index的参考fasta 文件;如果使用-c参数,则指明index的参考序列,例如,GGTCATCCT,ACGGGTCGT,CCGTTCTATGCGGCTTA.
<bt2_base>:指的是生成的index文件的前缀,默认情况,bowtie2-build产生NAME.1.bt2, NAME.2.bt2, NAME.3.bt2, NAME.4.bt2, NAME.rev.1.bt2, and NAME.rev.2.bt2, where NAME is <bt2_base>.
--threads 使用的线程数

例子

bowtie2-build -f /public/Reference/GRCh38.primary_assembly.genome.fa --threads 24 GRCh38

上述命令使用该fasta文件/public/Reference/GRCh38.primary_assembly.genome.fa ,在当前位置产生前缀为GRCh38的index文件。

Alignment

单端测序比对

bowtie2 [options]* -x <bt2-idx> -U <fq> -S <sam_output> -p <threads> 2>Align.summary

-x:参考基因组index文件的前缀(包括路径)
-U:单端测序的fastq文件
-S:输出的SAM文件,包含比对结果
-p:使用的线程数
"2>Align.summary":将输出到屏幕的标准误(standard error)重导向到"Align.summary"文件,其格式通常如下

## Single-end
20000 reads; of these:
  20000 (100.00%) were unpaired; of these:
    1247 (6.24%) aligned 0 times
    18739 (93.69%) aligned exactly 1 time
    14 (0.07%) aligned >1 times
93.77% overall alignment rate

## Paired-end
10000 reads; of these:
  10000 (100.00%) were paired; of these:
    650 (6.50%) aligned concordantly 0 times
    8823 (88.23%) aligned concordantly exactly 1 time
    527 (5.27%) aligned concordantly >1 times
    ----
    650 pairs aligned concordantly 0 times; of these:
      34 (5.23%) aligned discordantly 1 time
    ----
    616 pairs aligned 0 times concordantly or discordantly; of these:
      1232 mates make up the pairs; of these:
        660 (53.57%) aligned 0 times
        571 (46.35%) aligned exactly 1 time
        1 (0.08%) aligned >1 times
96.70% overall alignment rate
The indentation indicates how subtotals relate to t

双端测序比对

bowtie2 [options]* -x <bt2-idx> -1 <fq1> -2 <fq2> -S <sam_output> -p <threads> 2>Align.summary

双端比对模式基本与单端一致,只需替换fastq文件传入的参数即可
-1:一链fastq文件
-2:二链fastq文件

Bowtie2 还有更多详细的比对参数可以调整,这里就不一一介绍了。下面再介绍其输出的SAM文件中各列的含义。

SAM OUTPUT

SAM文件的每一行代表一个reads的比对情况,至少包含了12列(tab分割),从左往右,每一列的含义依次为:

  1. Read的名字
  2. flags之和

在bowtie2中,flags的含义为
1
The read is one of a pair
2
The alignment is one end of a proper paired-end alignment
4
The read has no reported alignments
8
The read is one of a pair and has no reported alignments
16
The alignment is to the reverse reference strand
32
The other mate in the paired-end alignment is aligned to the reverse reference strand
64
The read is mate 1 in a pair
128
The read is mate 2 in a pair
注意每个比对软件flags的含义有所区别

  1. 比对到的参考基因组染色体名称
  2. read 5’端比对到的参考基因组正链染色体坐标(1-based)
  3. 比对质量
  4. CIGAR字符串,用以表征比对的结果
  5. 双端测序中,二链所比对上的染色体名称,如果与一链相同则为=,如果没有二链则为*
  6. 双端测序中,二链read 5’端比对到的参考基因组正链染色体坐标(1-based),如果没有二链则为0
  7. 推测的一链与二链之间的片段长度。该值为负表明,二链比对到一链的上游;该值为0表明二链没有比对上;该值为non-0表明二链与一链比对到不同的染色体上(non-0如何理解?)
  8. Read的序列
  9. ASCII 编码的read碱基质量
  10. 可选的列,包括以下这些
AS:i:<N> Alignment score. Can be negative. Can be greater than 0 in --local mode (but not in --end-to-end mode). Only present if SAM record is for an aligned read. 
XS:i:<N> Alignment score for the best-scoring alignment found other than the alignment reported. Can be negative. Can be greater than 0 in --local mode (but not in --end-to-end mode). Only present if the SAM record is for an aligned read and more than one alignment was found for the read. Note that, when the read is part of a concordantly-aligned pair, this score could be greater than AS:i. 
YS:i:<N> Alignment score for opposite mate in the paired-end alignment. Only present if the SAM record is for a read that aligned as part of a paired-end alignment. 
XN:i:<N> The number of ambiguous bases in the reference covering this alignment. Only present if SAM record is for an aligned read. 
XM:i:<N> The number of mismatches in the alignment. Only present if SAM record is for an aligned read. 
XO:i:<N> The number of gap opens, for both read and reference gaps, in the alignment. Only present if SAM record is for an aligned read. 
XG:i:<N> The number of gap extensions, for both read and reference gaps, in the alignment. Only present if SAM record is for an aligned read. 
NM:i:<N> The edit distance; that is, the minimal number of one-nucleotide edits (substitutions, insertions and deletions) needed to transform the read string into the reference string. Only present if SAM record is for an aligned read. 
YF:Z:<S> String indicating reason why the read was filtered out. See also: Filtering. Only appears for reads that were filtered out. 
YT:Z:<S> Value of UU indicates the read was not part of a pair. Value of CP indicates the read was part of a pair and the pair aligned concordantly. Value of DP indicates the read was part of a pair and the pair aligned discordantly. Value of UP indicates the read was part of a pair but the pair failed to aligned either concordantly or discordantly. 
MD:Z:<S> A string representation of the mismatched reference bases in the alignm

以上就是对Bowtie 2进行基因组比对的一些总结,以后有新的心得再做补充。

ref:
http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#how-is-bowtie-2-different-from-bowtie-1

完。

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 213,864评论 6 494
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,175评论 3 387
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 159,401评论 0 349
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,170评论 1 286
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,276评论 6 385
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,364评论 1 292
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,401评论 3 412
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,179评论 0 269
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,604评论 1 306
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,902评论 2 328
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,070评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,751评论 4 337
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,380评论 3 319
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,077评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,312评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,924评论 2 365
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,957评论 2 351

推荐阅读更多精彩内容