htseq-count: counting reads within features — HTSeq 2.0.3 documentation
In the case of RNA-Seq, the features are typically genes, where each gene is considered here as the union of all its exons. One may also consider each exon as a feature, e.g., in order to check for alternative splicing. For comparative ChIP-Seq, the features might be binding region from a pre-determined list.
Important: The default for strandedness is yes. If your RNA-Seq data has not been made with a strand-specific protocol, this causes half of the reads to be lost. Hence, make sure to set the option --stranded=no unless you have strand-specific data!
Important: For paired-end reads, although position-sorted BAM files are supported, unsorted BAM files (i.e. in which the two reads of the pair are in consecutive lines of the BAM file) are highly recommended for htseq-count. If you are having trouble or unexpected results, sort your BAM file by name and try again.
Feature type (3rd column in GTF file) to be used, all features of other type are ignored (default, suitable for RNA-Seq analysis using an Ensembl GTF file: exon
The default, suitable for RNA-Seq analysis using an Ensembl GTF file, is gene_id.
Quality Assessment with htseq-qa
Given a FASTQ or SAM file, this script produces a PDF file with plots depicting the base calls and base-call qualities by position in the read. This is useful to assess the technical quality of a sequencing run.
htseq-count: counting reads within features
Given one/multiple SAM/BAM/CRAM files with alignments and a GTF file with genomic features, this script counts how many reads map to each feature. This script is especially popular for bulk and single-cell RNA-Seq analysis.
htseq-count-barcodes: counting reads with cell barcodes and UMIs
Similar to htseq-count, but for a single SAM/BAM/CRAM file containing reads with cell and molecular barcodes (e.g. 10X Genomics cellranger output). This script enables customization of single-cell RNA-Seq pipelines, e.g. to quantify exon-level expression or simply to obtain a count matrix that contains chromosome information additional feature metadata.