bedtools批量提取基因组指定位置序列

之前已经介绍过很多提取序列的方法，有脚本的也有软件的，这里再介绍一种方法。

用到软件是bedtools，具体方法如下：

>Usage:   bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf>

Options:
-fi Input FASTA file
-bed BED/GFF/VCF file of ranges to extract from -fi
-name Use the name field for the FASTA header
-split given BED12 fmt., extract and concatenate the sequencesfrom the BED "blocks" (e.g., exons)
-tab Write output in TAB delimited format.
- Default is FASTA format.
-s Force strandedness. If the feature occupies the antisense,
strand, the sequence will be reverse complemented.
- By default, strand information is ignored.
-fullHeader Use full fasta header.
- By default, only the word before the first space or tab is used.
其中-fi 指定基因组fasta文件，-bed 指定要提取序列的位置文件，可以是bed、gff 或 vcf 文件（染色体碱基位置从0开始计数）。
-tab 指定输出格式。

$bedtools getfasta -fi GCA_001651475.1_Ler_Assembly_genomic.fna -bed id.bed

>CM004359.1:0-10
gtttagggtt
>CM004359.1:100-200
ttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggttt
>CM004359.1:1000-1050
TTGTGGgaaaattatttagttgtaGGGATGAAGTCTTTCTTCGTTGTTGT

$bedtools getfasta -fi GCA_001651475.1_Ler_Assembly_genomic.fna -bed id.bed -tab

>CM004359.1:0-10 gtttagggtt
>CM004359.1:100-200 ttagggtttagg gtttagggtttagggtttagggttta gggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggttt
>CM004359.1:1000-1050 TTGTGGgaaaattatttagttgtaGGGATGAAGTCTTTCTTCGTTGTTGT

bedtools批量提取基因组指定位置序列

推荐阅读更多精彩内容