nuc 帮助文档
bedtools nuc;用哪个参数看哪个,对着命令来看帮助文档,那就是2分钟的事情,全篇看帮助文档,既浪费时间,也找不到重点,过两天就忘了,所以你总花了很多时间,却学不会生信。
Tool: bedtools nuc (aka nucBed)
Version: v2.28.0
Summary: Profiles the nucleotide content of intervals in a fasta file.
Usage: bedtools nuc [OPTIONS] -fi <fasta> -bed <bed/gff/vcf>
Options:
-fi Input FASTA file
-bed BED/GFF/VCF file of ranges to extract from -fi
-s Profile the sequence according to strand.
-seq Print the extracted sequence
-pattern Report the number of times a user-defined sequence
is observed (case-sensitive).
-C Ignore case when matching -pattern. By defaulty, case matters.
-fullHeader Use full fasta header.
- By default, only the word before the first space or tab is used.
Output format:
The following information will be reported after each BED entry:
1) %AT content
2) %GC content
3) Number of As observed
4) Number of Cs observed
5) Number of Gs observed
6) Number of Ts observed
7) Number of Ns observed
8) Number of other bases observed
9) The length of the explored sequence/interval.
10) The seq. extracted from the FASTA file. (opt., if -seq is used)
11) The number of times a user's pattern was observed.
(opt., if -pattern is used.)
# 结果的格式解读
命令
先生成第一个文件,看这里:https://www.jianshu.com/p/7efe0d1139a6
第二步处理:$ bedtools nuc -fi /public/reference/genome/hg38/hg38.fa -bed 200K.genome.3col >nuc.res
# fasta配置fai文件,必须和fasta存在于同一目录下
# samtools faidx /public/reference/genome/hg38/hg38.fa
# 输出结果:/public/reference/genome/hg38/hg38.fa.fai
$ bedtools nuc -fi /public/reference/genome/hg38/hg38.fa -bed 200K.genome.3col >hg38.gcstat.txt
输入文件
200K.genome.3col示例.png
结果文件
按区域统计出%AT、%GC、A、C、G、T、N的个数、其他碱基个数、统计的长度
#1_usercol 2_usercol 3_usercol 4_pct_at 5_pct_gc 6_num_A 7_num_C 8_num_G 9_num_T 10_num_N 11_num_oth 12_seq_len
chr1 0 200000 0.529890 0.420110 55672 42888 41134 50306 10000 0 200000
chr1 200000 400000 0.279935 0.220065 28770 22308 21705 27217 100000 0 200000