Sambamba 去除重复工具

写在前面

为什么会用这个工具呢
因为我听说很快,并且被 samtools markdup 和 picard 伤到了。用 samtools markdup的时候提醒我要先 fixmate 并且 sort 按照 read name 来,可是我先前是按照默认的sort方式来的,emmm。gatk picard 去除重复后,比原先文件还大,加了什么鬼东西啊

附上此工具链接

http://lomereiter.github.io/sambamba/docs/sambamba-markdup.html

开始

gzip -d sambamba-0.6.8.gz
chmod a+x sambamba-0.6.8

./sambamba-0.8.6

下载解压,放进环境变量,就是如此简单,不需要安装。

NAME

sambamba-markdup - finding duplicate reads in BAM file

SYNOPSIS

sambamba markdup OPTIONS <input.bam> <output.bam>

DESCRIPTION

Marks (by default) or removes duplicate reads. For determining whether a read is a duplicate or not, the same criteria as in Picard are used.

OPTIONS

-r, --remove-duplicates
remove duplicates instead of just marking them

-t, --nthreads=NTHREADS
number of threads to use

-l, --compression-level=N
specify compression level of the resulting file (from 0 to 9)");

-p, --show-progress
show progressbar in STDERR

--tmpdir=TMPDIR
specify directory for temporary files; default is /tmp

--hash-table-size=HASHTABLESIZE
size of hash table for finding read pairs (default is 262144 reads); will be rounded down to the nearest power of two; should be > (average coverage) * (insert size) for good performance

--overflow-list-size=OVERFLOWLISTSIZE
size of the overflow list where reads, thrown away from the hash table, get a second chance to meet their pairs (default is 200000 reads); increasing the size reduces the number of temporary files created

--io-buffer-size=BUFFERSIZE
controls sizes of two buffers of BUFFERSIZE megabytes each, used for reading and writing BAM during the second pass (default is 128)

测试

去重复特别快,3G的bam文件去重复时间只用了1min。

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • pyspark.sql模块 模块上下文 Spark SQL和DataFrames的重要类: pyspark.sql...
    mpro阅读 9,550评论 0 13
  • Introduction What is Bowtie 2? Bowtie 2 is an ultrafast a...
    wzz阅读 5,923评论 0 5
  • NAME dnsmasq - A lightweight DHCP and caching DNS server....
    ximitc阅读 2,942评论 0 0
  • 烟花易冷,人事易分 青春无己,已知烟花易冷; 离合天定,信哉人事易分
    嚭呓阅读 387评论 0 0
  • 数百年前,东海龙族因为哪吒三太子大闹东海,杀了敖丙,结下了深仇大恨,因此在封神之战中选择支持商纣王。也正因为如此,...
    书生幺阅读 276评论 0 0