Subread package: high-performance read alignment, quantification and mutation discovery
The Subread package comprises a suite of software programs for processing next-gen sequencing read data including:
- Subread: a general-purpose read aligner which can align both genomic DNA-seq and RNA-seq reads. It can also be used to discover genomic mutations including short indels and structural variants.
- Subjunc: a read aligner developed for aligning RNA-seq reads and for the detection of exon-exon junctions. Gene fusion events can be detected as well.
- featureCounts: a software program developed for counting reads to genomic features such as genes, exons, promoters and genomic bins.
- Sublong: a long-read aligner that is designed based on seed-and-vote.
- exactSNP: a SNP caller that discovers SNPs by testing signals against local background noises.
下载最新版的subread,目前是2.0.3
wget https://nchc.dl.sourceforge.net/project/subread/subread-2.0.3/subread-2.0.3-Linux-x86_64.tar.gz
tar -zxvf subread-2.0.3-Linux-x86_64.tar.gz
利用featureCounts 进行Bam文件中每个基因匹配的reads计数
- -T用于指定使用的CPU线程数
- -t 用来指定注释信息的feature type,默认为exon
- -p用来指定bam文件是双端测序比对获得的,双端测序不添加此参数会报错
- -g Specify attribute type in GTF annotation. 'gene_id' by default. Meta-features used for read counting will be extracted from annotation using the provided value;
- -o:指定结果文件的名称及路径
- 最后添加上说需要处理的bam文件的路径
- 所统计的bam文件是通过sort之后的, samtools sort
featureCounts -T 8 -p -t exon -g gene_id -a /root/Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.gtf -o all_expression_count.out ./*_sorted.bam
只保留all.id.txt文件的【基因名】和【样本counts】
cat all_expression_count.out | cut -f1,7- > counts.txt