introduction
Bedtools is developed in the Quinlan lab at the University of Utah.
bedtools website
BED format见文末
setup
download bedtools2 from https://github.com/arq5x/bedtools2
make
sudo make install
just type bedtools
bedtools --version #cheak version
bedtools --contact #get more help
download sample files
curl -O https://s3.amazonaws.com/bedtools-tutorials/web/cpg.bed
curl -O https://s3.amazonaws.com/bedtools-tutorials/web/exons.bed
curl -O https://s3.amazonaws.com/bedtools-tutorials/web/gwas.bed
curl -O https://s3.amazonaws.com/bedtools-tutorials/web/hesc.chromHmm.bed
bedtools "intersect"
The intersect
command is the workhorse of the bedtools
suite. BED/BAM/VCF/GFF
基本命令
bedtools intersect -a cpg.bed -b exons.bed > merge.out
只取出cpg.bed和exons.bed重叠的部区域,merge.out文件中CpG与cpg.bed文件中对应的CpG位置可能不同,merget.out ⊆ a.bed
输出原始文件中的位置
选项-wa
-wb
bedtools intersect -a cpg.bed -b exons.bed -wa > mergeOrigin.out
先算两文件的交集,然后根据交集位置在a文件中取出原始长度
bedtools intersect -a cpg.bed -b exons.bed -wb > mergeOrigin.out
先算两文件的交集,然后根据交集位置在b文件中取出原始长度
bedtools intersect -a cpg.bed -b exons.bed -wa -wb > mergeOrigin.out
先算两文件的交集,然后根据交集位置取出a b文件中包含交集的原始位置
计算overlap的碱基数
选项-wo
bedtools intersect -a cpg.bed -b exons.bed -wo > mergeNum.out
输出a和b文件overlap区域,并且计算出overlap区的碱基数
计算a文件的每个特征区域对应b文件中几个重叠区域
选项-c
bedtools intersect -a cpg.bed -b exons.bed -c > mergeCount.out
提取a文件中与b文件没有重叠的特征区域
选项-v
bedtools intersect -a cpg.bed -b exon.bed -v > mergeDontOverlap.out
设定重叠区域下限
选项-f
bedtools intersect -a cpg.bed -b exon.bed -f 0.5 >merge.out
如果不加
-f
选项默认的两个区域至少有1 bp的重叠
sorted一下会节省运行时间
选项-sorted
'bedtools intersect -a cpg.bed -b exon.bed -sorted >merge.out'
加入
-sorted
参数可以加快运行速度,节省时间
a文件与多个b文件进行intersect
bedtools2支持一个a文件与一个或多个b文件进行intersect操作
bedtools intersect -a exons.bed -b cpg.bed gwas.bed chromHmm.bed -sorted
一个a文件与多个b文件进行intersect时只需要在
-b
后面跟着几个文件名就可以,其他的选项与上述相同
Brief BED format introduction
include three fields: Required fields, Optional fields, Track line
Required fields
- chrom
- chromStart
- chromEnd
chr1 100000000 100000010
chr1 100000030 100000060
chr2 100010000 100300010
Optional fields
- name
- score
- strand
- thickStart
- thickEnd
- itemRgb
- blockCount
- blockSizes
- blockStarts
Track lines
就是文件开头的简介行