bedtools----getfasta

getfasta可以根据BED/GFF/VCF文件提供的feature在染色体上的位置信息,从fasta中提取feature的碱基序列

bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf> -fo <fasta>


Tip

  1. The headers in the input FASTA file must exactly match the chromosome column in the BED file.#FASTA 中序列名与BED文件中染色体名字要一一对应,不然找不到;类似于perl中hash,python中字典。
  2. You can use the UNIX fold command to set the line width of the FASTA output. For example, fold -w 60 will make each line of the FASTA file have at most 60 nucleotides for easy viewing.
  3. BED files containing a single region require a newline character at the end of the line, otherwise a blank output file is produced.
$ bedtools getfasta -h

Tool:    bedtools getfasta (aka fastaFromBed)
Version: v2.25.0
Summary: Extract DNA sequences into a fasta file based on feature coordinates.

Usage:   bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf> -fo <fasta>

Options:
-fi     Input FASTA file #samtools先建立index再使用
-bed    BED/GFF/VCF file of ranges to extract from -fi
-fo     Output file(can be FASTA or TAB-delimited)
-name   Use the name field for the FASTA header
-split  given BED12 fmt., extract and concatenate the sequencesfrom the BED "blocks" (e.g., exons)
-tab    Write output in TAB delimited format.#格式:name \t sequence
                - Default is FASTA format.

-s      Force strandedness. If the feature occupies the       antisense strand, the sequence will be reverse complemented.- By default, strand information is ignored.#考虑链的正负方向,+提取正链序列,-提取正链序列

-fullHeader     Use full fasta header.- By default, only the word before the first space or tab is used.

例子:

结果文件中fasta格式序列名字默认格式: “<chrom>:<start>-<end>”。-name可设定提取的序列名字为对应的BED文件中feature名字

$ cat test.fa
>chr1
AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG

$ cat test.bed
chr1 5 10

$ bedtools getfasta -fi test.fa -bed test.bed
>chr1:5-10
AAACC

# optionally write to an output file
$ bedtools getfasta -fi test.fa -bed test.bed -fo test.fa.out

$ cat test.fa.out
>chr1:5-10
AAACC
#-name设定提取的序列名字为对应的BED文件中feature名字
$ cat test.fa
>chr1
AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG

$ cat test.bed
chr1 5 10 myseq

$ bedtools getfasta -fi test.fa -bed test.bed -name
>myseq
AAACC

-s #考虑链的正负方向,+提取正链序列,-提取正链序列

$ cat test.fa
>chr1
AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG

$ cat test.bed
chr1 20 25 forward 1 +
chr1 20 25 reverse 1 -

$ bedtools getfasta -fi test.fa -bed test.bed -s -name
>forward
CGCTA
>reverse
TAGCG
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi阅读 7,448评论 0 10
  • 用网易云音乐看歌曲评论是很有意思的一个环节,身边不少朋友,都是一边听着歌,一边看着评论里的故事,这些评论让歌曲的内...
    bebe197bf2d7阅读 533评论 5 3
  • 我家对门有个跟儿子一般大的小姑娘,恰巧跟儿子上同一所幼儿园。于是每天接送往返的路上都会遇见她姥姥驼着背,满头银发...
    南雅之简阅读 459评论 1 0
  • “真对不起,我来晚了。让你在这久等了!” “没多久,你能来接我已经是万分感激了。” “走吧,我那儿都为你收拾好了。...
    顺着风飞翔阅读 476评论 0 2