修改BAM文件中的染色体名字

在处理ChIP-seq数据的时候,遇到这样一个问题:前期我的FASTA和GTF文件中染色体的名字是1,2,3......22这样的,但后期某些分析过程要求染色体名称必须以chr开头,比如用来找enhancer/super enhancer的ROSE。这时候就需要对BAM文件进行修改,具体方法如下:

samtools view -H ${id}_input.deduplicate.bam | sed -e 's/SN:\([0-9XY]\)/SN:chr\1/' -e 's/SN:MT/SN:chrM/' | samtools reheader - ${id}_input.deduplicate.bam > ${id}_input.deduplicate.chr.bam

只需要修改BAM文件的header部分就可以,方便又快捷。关于为什么只需要修改BAM文件的header,我在biostar上看到有个回答非常棒,引用一下:

samtools reheader <in.header.sam> <in.bam>
Replace the header in in.bam with the header in in.header.sam. This command is much faster than replacing the header with a BAM→SAM→BAM conversion.
For those reading this and wondering, "but what about the chromosome names for each read?!?", the answer is that those names aren't actually stored in a BAM file. Rather, alignments have chromosome index number associated with them and the name you see when you use samtools view is taken from the header.

https://www.biostars.org/p/386231/

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容