SAMTools是一个用于处理SAM格式文件的工具,功能包括SAM到BAM或相反方向的文件格式转换、文件合并、排序、建立索引等。
SAM(sequence Alignment/mapping)数据格式是目前高通量测序中存放比对数据的标准格式,用来存储reads到参考序列的比对信息,以制表符作为分隔符。BAM是SAM的二进制文件,占用的磁盘空间比SAM文本文件小,且使用BAM文件进行运算速度较快
基本功能
$ samtools --help
Program: samtools (Tools for alignments in the SAM format)
Version: 1.3.1 (using htslib 1.3.1)
Usage: samtools <command> [options]
Commands:
-- Indexing
dict create a sequence dictionary file
faidx index/extract FASTA
index index alignment
-- Editing
calmd recalculate MD/NM tags and '=' bases
fixmate fix mate information
reheader replace BAM header
rmdup remove PCR duplicates
targetcut cut fosmid regions (for fosmid pool only)
addreplacerg adds or replaces RG tags
-- File operations
collate shuffle and group alignments by name
cat concatenate BAMs
merge merge sorted alignments
mpileup multi-way pileup
sort sort alignment file
split splits a file by read group
quickcheck quickly check if SAM/BAM/CRAM file appears intact
fastq converts a BAM to a FASTQ
fasta converts a BAM to a FASTA
-- Statistics
bedcov read depth per BED region
depth compute the depth # 统计每个碱基位点的测序深度,注意计算前必须对bam文件排序并构建索引
flagstat simple stats
idxstats BAM index stats
phase phase heterozygotes
stats generate stats (former bamcheck)
-- Viewing
flags explain BAM flags # 查看不同flag值的含义
tview text alignment viewer # 直观地显示reads比对到参考基因组的情况,类似于基因组浏览器。需先对bam文件进行排序并构建索引
view SAM<->BAM<->CRAM conversion # 将sam和bam文件进行格式互换
depad convert padded BAM to unpadded BAM
使用示范
cd ~/RNA-seq_analysation/refe_genome
# 为参考基因组建立索引
## faidx -- 对fasta格式的文件建立索引,生成的索引文件以.fai后缀结尾
samtools faidx GCA_002173515.1_ASM217351v1_genomic.fna
# 把sam文件转换为bam文件
## 代码示例如下
## samtools view -bS -1 test.sam > test.bam # sam转bam
## samtools view -h test.bam > test.sam # bam转sam
## 不知道为什么,我用hisat2对比完输出的结果文件就是bam文件,所以这一步省略
#为bam文件排序
## sort -- sort alignment file 对比对后的bam文件进行排序,默认按coordinate进行排序
## sort只能为bam文件排序,不能为sam文件排序
samtools sort SRR20958310.bam > SRR20958310.sorted.bam
# 为排序后的bam文件建立索引,生成.bai文件
samtools index SRR20958310.sorted.bam
# 统计测序深度
samtools depth SRR20958310.sorted.bam > SRR20958310.depth.txt
# 直观地查看比对结果
samtools tview SRR20958310.sorted.bam GCA_002173515.1_ASM217351v1_genomic.fna
tview功能的详细解释
https://blog.csdn.net/qq_28723681/article/details/120519115
批量比对
cd ~/RNA-seq_analysation/out_files/align
nano sambamindex.txt
# 输入以下命令
for i in 5 6 7 8 9;
do
samtools sort SRR2095830${i}.bam > SRR2095830${i}.sorted.bam
samtools index SRR2095830${i}.sorted.bam
done
# Ctrl+x保存
chmod +x sambamindex.txt
./sambamindex.txt
参考资料:
https://zhuanlan.zhihu.com/p/425912805
samtool命令及参数大全
https://www.jianshu.com/p/68f6e35fa4a2