在bam文件中的第二列有这样一些数字,99,147,83,77,141等等,这些数字表示了这条reads的比对的状态。
samtools view FA51.ht2p.bam|less
ST-E00192:599:HHWGGCCXY:3:1101:1032:12842 99 14 49862576 60 149M = 49862627 200 CAGGCTGGAGTGCAGTGGCTATTCACAGGCGCGATCCCACTACTGATCAGCACGGGAGTTTTGACCTGCTCCGTTTCCGACCTGGGCCGGTTCACCCCTCCTTAGGCAACCTGGTGGTCCCCCGCTCCCGGGAGGTCACCATATTGATG AAFFAJJJFFJJ<A<AJJJJJJJJFFFJJJJJJFJJJFF<JA7FJFF7FJJFFJAJJJAFJJJJJJJJJFJFAJJJ-FJJFJJ-FJJJJJJJJJJAJJFJJJFFJJJJJFFAJFFJJAFJJ<-AJ<F<JJJJ)FF7<AFJJJ7AAFAJF AS:i:0 ZS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:149 YS:i:0 YT:Z:CP NH:i:1
ST-E00192:599:HHWGGCCXY:3:1101:1032:12842 147 14 49862627 60 149M = 49862576 -200 ACGGGAGTTTTGACCTGCTCCGTTTCCGACCTGGGCCGGTTCACCCCTCCTTAGGCAACCTGGTGGTCCCCCGCTCCCGGGAGGTCACCATATTGATGCCGAACTTAGTGCGGACACCCGATCGGCATAGCGCACTACAGCCCAGAACT )<<J<<777-<-))A7))-7FAFFA<A<<<-)))FJFJ<<<7)JJJFJJFFFJAJJJJJJJFFF77-FJJFJ<<<JJJA7FJA<AJ7JJJJJFFFJF<FFAFJJJJJJJJFJF<JJJFFFJAJJFJJJJJJFJFF7JJJFJF-FAFAAA AS:i:0 ZS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:149 YS:i:0 YT:Z:CP NH:i:1
ST-E00192:599:HHWGGCCXY:3:1101:1032:13650 99 12 46365362 60 149M = 46365521 308 CTAGATGAACATTTACTCAGGTTAGAAGTCGGGTATAAAACAGTGTAACTTTTGTCCTATTACTTGCTCTGGGAAGGCTGCCAGCTGATTCATAACACAAAATGCTTCTCAAAGACTTGATACTAATGCAGGGAACAAAGCTTGTTTGT AA<AF<FFFJJJ7A-FF77<FJFJJJJJJJJJJ<FJJ-7<FFFFFJJFFJJAFJJJ--<-<FJJJF-F-AAFA7AFFJJJF7<J--FAFJ-JJFFJFJFFFJAJFFAJAFFF7AF-7AJFAAFA7-7FAAFAJ7<AJ-AFA--7AJAJF AS:i:-6 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:57C1C89 YS:i:-11 YT:Z:CP NH:i:1
ST-E00192:599:HHWGGCCXY:3:1101:1032:13650 147 12 46365521 60 149M = 46365362 -308 ATCTTATCTTTAAGCACTTTCAGAAAAATTATCCACCAACTTTGAAACAGCTTTAGCATGTAATTTCTAATGCTGGAAACTAAAAAATTTTAAAACAAGTTTTTTTCCAGATGCGTTAGCAAATATTATTAAATATTTCAATGAGGAAA 7------7----A<AFA7<JFA--<7-<---))<JFJF7JJA-7--JJAFA<-<<-AAFA<-7-7A7J7F-FA<7777<--JJAJF--JF<JJFAFAA-<J<AJ--JFFJFJF7-JJFFJJJF<FFFJJJJ<FF7--FFF-7A7AA<AA AS:i:-11 XN:i:0 XM:i:4 XO:i:0 XG:i:0 NM:i:4 MD:Z:30G1A11G0T103 YS:i:-6 YT:Z:CP NH:i:1
ST-E00192:599:HHWGGCCXY:3:1101:1032:14600 99 14 49862566 60 149M = 49862682 262 CTATGTTGCTCAGGCTGGAGTGCAGTGGCTATTCACAGGCGCGATCCCACTACTGATCAGCACGGGAGTTTTGACCTGCTCCGTTTCCGACCTGGGCCGGTTCACCCCTCTTTAGGCAACCTGGTGGTCCCCCGCTCCCGGGAGGTCAC AAF<FFJJAF7AJ<JJJJ77<AFJ-7AJ7FJFJ7<<<FJJ-AFJJJJJ<-F-<FFA7FJAJJJFJFA<7<FFFFJFJJJFFJJJFJFJJJJAJAJJ<AJJAJJAJJJJJF-AFAFFFFJFFJFFJJFJ<7F<)-)--JF7JFJAFJ77< AS:i:-3 ZS:i:-8 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:110C38 YS:i:0 YT:Z:CP NH:i:1
ST-E00192:599:HHWGGCCXY:3:1101:1032:14600 147 14 49862682 60 146M = 49862566 -262 CAACCTGGTGGTCCCCCGCTCCCGGGAGGTCACCATATTGATGCCGAACTTAGTGCGGACACCCGATCGGCATAGCGCACTACAGCCCAGAACTCCTGGGCTCAAGCGATCCTCCCACCTCAGCCTCCCGAGTAGCTGGGACTACA FJAFF7F77J<-F7JJFF7)JF<<<)J7<7JAF77-F<A)7)JJF-FJA-JAA7-A--JFAJAA7777--7AJF<JJJJAAFJJJF7-A-AAJF<F<A7-<AFA7-A<<-<F7FJ<AF7-JFFJA-JFJF<AJJFA7A-JJF7J<F AS:i:0 ZS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:146 YS:i:-3 YT:Z:CP NH:i:1
ST-E00192:599:HHWGGCCXY:3:1101:1032:14846 99 14 49853665 60 41M8935N108M = 49853729 9020 ACAGGCGCGATCCCACTACTGATCAGCACGGGAGTTTGGACCTGCTCCGTTTCCGACCTGGGCCGGTTCACCCCTCCTTAGGCAACCTGGTGGTCCCCCGCTCCCGGGAGGTCACCATATTGATGCCGAACTTAGTGCGGACACCCGAT AAFFJAJJAJJ7JJJJJ7JJJJJJJJJJJFJJFJJJJ-F<-7AJA7F<FJJJA-FJ-<FJJJJFJJJJJA-7FJJJJJJAAJJJAJJJJJJJJF--7A<FJ7FJJJJJJJFFFJ<FJJJJJJ-7F7-AJFJFFFJJJJJJ<))A-AA)) AS:i:-8 ZS:i:-3 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:8A28T111 YS:i:-5 YT:Z:CP XS:A:- NH:i:1
ST-E00192:599:HHWGGCCXY:3:1101:1032:14846 147 14 49853729 60 46M8935N103M = 49853665 -9020 GGTTCCCCCCTCCTTAGGCAACCTGGTGGTCCCCCGCTCCCGAGAGGTCACCATATTGATGCCGAACTTAGTGCGGACACCCGATCGGCATAGCGCACTACAGCCCAGAACTCCTGGGCTCAAGCGATCCTCCCACCTCAGCCTCCCGA --7<-))<<7--FFF<7)A7<A777FF-7)JF<<)<-7J<<7-AFF7FF--FF<A---7F-JJFAFA-AJJJJFFA7777<JF<<AF-FF-<AA--FJJFJJJJJFF7JJFAJJA-AJF7JJJJFAA-JFAFFFFJJ<FFJJAAJFAAA AS:i:-5 ZS:i:-5 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:5A36G106 YS:i:-8 YT:Z:CP XS:A:+ NH:i:1
ST-E00192:599:HHWGGCCXY:3:1101:1032:14986 99 14 49862563 60 148M1S = 49862662 245 TCGCTATGTTGCTCAGGCTGGAGTGCAGTGGCTATTCACAGGCGCGATCCCACTACTGATCAGCATGGGAGTTTTGACCTGCTCCGTTTCCGACCTGGGCCGGTTCACCCCTCCTTAGGCAACCTGGTGGTCCCCCGCTCCCGGGAGGG AAF7AJJFJJ<FF7FFJFAFJJJFJ-FJJ--FJJJJJ<F7-AJJ<JJJ-7FJF-FJFJJJ-7F---<--AJAJJJJJJJAJ<JAAJAJFFFJJAJJ7J-77FFJFJJJJ-7A-7FJFJJFFJJAFFJAJJJFFJJJ)7AFJJJJJAJJ- AS:i:-4 ZS:i:-9 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:65C82 YS:i:-5 YT:Z:CP NH:i:1
ST-E00192:599:HHWGGCCXY:3:1101:1032:14986 147 14 49862662 60 138M79055N8M = 49862563 -245 CCGGTTCACCCCTCCTTAGACAACCTGGTGGGCCCCCGCTCCCGGGAGGTCACCATATTGATGCCGAACTTAGTGCGGACACCCGATCGGCATAGCGCACTACAGCCCAGAACTCCTGGGCTCAAGCGATCCTCCCACCTCAGCCT )7<--<-))-A-<-<-<A<-AJAA<7-<-)-)-A-FFJA)7-<)7)<)7--AJJFJFFA-77AJ7-JJFAJJJAJJJJF77FJAFFJAAJFJFFA<AF7AF<JF-<<JFJF-AJAAAA-AJJAJA-7-7F77-FJFFA7AF-AFA- AS:i:-5 ZS:i:-5 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:19G11T114 YS:i:-4 YT:Z:CP XS:A:- NH:i:1
ST-E00192:599:HHWGGCCXY:3:1101:1032:15373 83 14 49586700 60 149M = 49586621 -228 TTCGGCATCAATATGGTGACCTCCCGGGAGCGGGGGACCACCAGGTTGCCTAAGGAGGGGTGAACCGGCCCAGGTCGGAAACGGAGCAGGTCAAAACTCCCGTGCTGATCAGTAGTGGGATCGCGCCTGTGAATAGCCACTGCACTCCA FJJAFFAAJJJJJJJ<--)JJAJAAAFJA<FAAF7FJJJJJJJJJJJJJJFJJJJJJJJJAA77JJJJJJJJJJFJJFFJFJJJJJJJJJJJJJF-J7JJJJJJJJJ<JJJJFJJJJJAJFJ<<JJJJJJJJJFJJJJ7JJJJJF<AAA AS:i:0 ZS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:149 YS:i:0 YT:Z:CP NH:i:1
ST-E00192:599:HHWGGCCXY:3:1101:1032:15373 163 14 49586621 60 149M = 49586700 228 GAGGCTGAGGCTGGAGGATCGCTTGAGTCCAGGAGTTCTGGGCTGTAGTGCGCTATGCCGATCGGGTGTCCGCACTAAGTTCGGCATCAATATGGTGACCTCCCGGGAGCGGGGGACCACCAGGTTGCCTAAGGAGGGGTGAACCGGCC AAAFF<JJJJJFJJJJJJAJJJFFJJJJJJFJJJJJJJJJJJJFJJJJJJJJJFJFJA7FFFJFJJFJA--FFA7AJ<JA7AJA<AAFJJFJFJF<FAJJJJ77FJJ7AAFJJJF--)A<A7AJA<FAA-A7AF<FFJJAJF-7-AF<< AS:i:0 ZS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:149 YS:i:0 YT:Z:CP NH:i:1
ST-E00192:599:HHWGGCCXY:3:1101:1032:17483 77 * 0 0 * * 0 0 GAGGAGTTATTGGGGTTGAATTTGCACAAATATTTGCAACCGCAGGAACAAAAGTTACTATTTTACAAAACCTACCATTAATTTTAGCTAACCTTGATAGTGAAATTTCAAAGCAATTAAGTGCTAATTTAGAAAAATTAGGTGTAAAA AAAFFFAF<FFJAJAF<JJJJJJFJJJJJAJ-JJ-F-<JJJAFJJJJJ-FJJJFJJJ<J<JJJJJJJ---AFJA7JJJJJFJJJJFJJJJJJ<JJJJJJAJJFJAJJJFA<-77FF-FJJJJ7<FFAAAJAJ-7-77AFAFJJJJJJJF YT:Z:UP
ST-E00192:599:HHWGGCCXY:3:1101:1032:17483 141 * 0 0 * * 0 0 TCAGATTTAATTCTGTGTTCTTGATTATCAACTGTGTAAACAACTTCATCGTTTTCAAATCTTTGAGTTGTTGCATTTGTAACGATTTTTACACCTAATTTTTCTAAATTAGCACTTACTTGCTTTGAAATTTCACTATCAAGGTTAGC AAAFJAJJJJFJJJFJJJF-FFF-FFJJJJ-FJJJJFJF<AJJFJJF7F-FFJJJFJJJJ<JJJJ-FFJ-FJAF-AJJJJJJJJJJJJFJFFF--A-7FFFJJAAJJ7FFJJA--7A--<FA-<<FFAF-<7AFJ-7-<-<7F<F-<F- YT:Z:UP
以下以双末端测序的bam文件为例,这些数字表示的意义分为3种类型:
1.两条reads都比对上
99=1+2+32+64:该对reads都比对到基因组,这是read1,比对到参考基因组正链
147=1+2+16+128:该对reads都比对到基因组,这是read2,比对到参考基因组负链
83=1+2+16+64:该对reads都比对到基因组,这是read1,比对到参考基因组负链
163=1+2+32+128:该对reads都比对到基因组,这是read2,比对到参考基因组正链
2. 两条reads都没有比对上
77=1+4+8+64:该对reads都没有比对到参考基因组,这是read1
141=1+4+8+128:该对reads都没有比对到参考基因组,这是read2
3. 两条reads有一条没有比对上,另一条比对上
69=1+4+64:该对reads中的read1,没有比对到参考基因组,与其配对的read2比对到参考基因组正链
153=1+8+16+128:该对reads中的read2,比对到参考基因组正链,与其配对的read1没有比对到参考基因组
89=1+8+16+64:该对reads中的read1,比对到参考基因组负链,与其配对的read2没有比对到参考基因组
133=1+4+128:该对reads中的read2,没有比对到参考基因组,与其配对的read1比对到参考基因组负链
每个数字都是由2的n次方相加而来,即等号后的数字。可通过samtools flags +数字查看。
samtools flags 1
#0x1 1 PAIRED
samtools flags 2
#0x2 2 PROPER_PAIR
samtools flags 4
#0x4 4 UNMAP
amtools flags 8
#0x8 8 MUNMAP
samtools flags 16
#0x10 16 REVERSE
samtools flags 32
#0x20 32 MREVERSE
samtools flags 64
#0x40 64 READ1
samtools flags 128
#0x80 128 READ2
samtools flags 256
#0x100 256 SECONDARY
samtools flags 512
#0x200 512 QCFAIL
samtools flags 1024
#0x400 1024 DUP
samtools flags 2048
#0x800 2048 SUPPLEMENTARY
解释为:
1:The read is one of a pair
2:The alignment is one end of a proper paired-end alignment
4:The read has no reported alignments
8:The read is one of a pair and has no reported alignments
16:The alignment is to the reverse reference strand
32:The other mate in the paired-end alignment is aligned to the reverse reference strand
64:The read is mate 1 in a pair
128:The read is mate 2 in a pair
翻译如下:
1(1)双末端测序
2(10)paired reads都正确比对到参考序列上
4(100)该read没比对到参考序列上
8(1000)与该read成对的另一条read没有比对到参考序列上
16(10000)该read比对到参考序列负链
32(100000)与该read成对的另一条read比对到参考序列负链
64(1000000)该read是成对reads中的read1
128(10000000)该read是成对reads中的read2
Flags表示的意义可以用下图概括
参考:
SAM Format Flag