bedtools nuc命令：按区域统计碱基比率和个数

nuc 帮助文档

bedtools nuc;用哪个参数看哪个，对着命令来看帮助文档，那就是2分钟的事情，全篇看帮助文档，既浪费时间，也找不到重点，过两天就忘了，所以你总花了很多时间，却学不会生信。

Tool:    bedtools nuc (aka nucBed)
Version: v2.28.0
Summary: Profiles the nucleotide content of intervals in a fasta file.

Usage:   bedtools nuc [OPTIONS] -fi <fasta> -bed <bed/gff/vcf>

Options: 
    -fi Input FASTA file

    -bed    BED/GFF/VCF file of ranges to extract from -fi

    -s  Profile the sequence according to strand.

    -seq    Print the extracted sequence

    -pattern    Report the number of times a user-defined sequence
            is observed (case-sensitive).

    -C  Ignore case when matching -pattern. By defaulty, case matters.

    -fullHeader Use full fasta header.
        - By default, only the word before the first space or tab is used.

Output format: 
    The following information will be reported after each BED entry:
        1) %AT content
        2) %GC content
        3) Number of As observed
        4) Number of Cs observed
        5) Number of Gs observed
        6) Number of Ts observed
        7) Number of Ns observed
        8) Number of other bases observed
        9) The length of the explored sequence/interval.
        10) The seq. extracted from the FASTA file. (opt., if -seq is used)
        11) The number of times a user's pattern was observed.
            (opt., if -pattern is used.)
# 结果的格式解读

命令

先生成第一个文件，看这里：https://www.jianshu.com/p/7efe0d1139a6
第二步处理：$ bedtools nuc -fi /public/reference/genome/hg38/hg38.fa -bed 200K.genome.3col >nuc.res

# fasta配置fai文件，必须和fasta存在于同一目录下
# samtools faidx /public/reference/genome/hg38/hg38.fa
# 输出结果：/public/reference/genome/hg38/hg38.fa.fai
$ bedtools nuc -fi /public/reference/genome/hg38/hg38.fa -bed 200K.genome.3col >hg38.gcstat.txt

输入文件

200K.genome.3col示例.png

结果文件

按区域统计出%AT、%GC、A、C、G、T、N的个数、其他碱基个数、统计的长度

#1_usercol  2_usercol   3_usercol   4_pct_at    5_pct_gc    6_num_A 7_num_C 8_num_G 9_num_T 10_num_N    11_num_oth  12_seq_len
chr1    0   200000  0.529890    0.420110    55672   42888   41134   50306   10000   0  200000
chr1    200000  400000  0.279935    0.220065    28770   22308   21705   27217   100000  0  200000