blast notes

Tools

image.png

blastn

-outfmts 6 default values

-outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore"

it seemed that pident equals to nident /length.

blastn supported parameters

qseqid    Query Seq-id
qgi       Query GI
qacc      Query accesion
qaccver   Query accesion.version
qlen      Query sequence length
sseqid    Subject Seq-id
sallseqid All subject Seq-id(s), separated by a ';'
sgi       Subject GI
sallgi    All subject GIs
sacc      Subject accession
saccver   Subject accession.version
sallacc   All subject accessions
slen      Subject sequence length
qstart    Start of alignment in query
qend      End of alignment in query
sstart    Start of alignment in subject
send      End of alignment in subject
qseq      Aligned part of query sequence
sseq      Aligned part of subject sequence
evalue    Expect value
bitscore  Bit score
score     Raw score
length    Alignment length
pident    Percentage of identical matches
nident    Number of identical matches
mismatch  Number of mismatches
positive  Number of positive-scoring matches
gapopen   Number of gap openings
gaps      Total number of gaps
ppos      Percentage of positive-scoring matches
frames    Query and subject frames separated by a '/'
qframe    Query frame
sframe    Subject frame
btop      Blast traceback operations (BTOP)
staxids   Subject Taxonomy ID(s), separated by a ';'
sscinames Subject Scientific Name(s), separated by a ';'
scomnames Subject Common Name(s), separated by a ';'
sblastnames Subject Blast Name(s), separated by a ';'   (in alphabetical order)
sskingdoms  Subject Super Kingdom(s), separated by a ';'     (in alphabetical order)
stitle      Subject Title
salltitles  All Subject Title(s), separated by a '<>'
sstrand   Subject Strand
qcovs     Query Coverage Per Subject
qcovhsp   Query Coverage Per HSP

example

blastn -db Sub_database -query query.fa -num_threads 10 -evalue 1e-6 \
    -outfmt '6 qseqid sseqid pident nident qlen slen evalue bitscore' -out query_blastn.txt

word_size

Changing the initial word-size can help to find more, but less accurate hits; or to limit the results to almost perfect hits.

Decreasing the word-size will increase the number of detected homologous sequences, but hits can include alignments of higher fragmentation due to gaps and substitutions (example: search for homologous genes between distant species, see also: -task blastn)
Increasing the word-size will give less hits as it requires a longer continuous regions of exact match. If the word-size is chosen to be almost the size of the query, BLAST will search for almost exact matches (example: search for location of gene sequences in the original genome of the gene)

For short sequences, word-size must be less than half the query length, otherwise reliable hits can be missed.

max_target_seqs

The invocation using the parameter ‘-max_target_seqs 1’ simply returns the first good hit found in the database, not the best hit as one would assume. Worse yet, the output produced depends on the order in which the sequences occur in the database.

E-value & Bit-score

The smaller the E-value, the better the match.
The higher the bit-score, the better the sequence similarity

-max_hsps

任一query 序列与同一subject的匹配结果的保留个数（e-value排序）。

Descriptions

if one is interested in the descriptions of the matched sequences in particular - useful not just for human interpretation but handy to search for keywords such as an enzyme or organism name. the stitle will add the titles/descriptions for the matchs.

~/miniconda3/pkgs/blast-2.10.1-pl526he19e7b1_1/bin/blastp -num_threads 10 \
  -evalue 1e-5 -outfmt '6 qseqid sseqid pident nident qlen slen evalue stitle' \
  -db ath.db -query pep.fa -out blast.out -subject_besthit

Reference (copied from)

https://www.metagenomics.wiki/tools/blast/default-word-size
https://www.metagenomics.wiki/tools/blast/blastn-output-format-6
https://www.metagenomics.wiki/tools/blast/evalue
https://www.ncbi.nlm.nih.gov/books/NBK279684/#appendices.Options_for_the_commandline_a
The Biostar Handbook: 2nd Edition
Nidhi Shah, Michael G Nute, Tandy Warnow, Mihai Pop, Misunderstood parameter of NCBI BLAST impacts the correctness of bioinformatics workflows, Bioinformatics, Volume 35, Issue 9, 1 May 2019, Pages 1613–1614, https://doi.org/10.1093/bioinformatics/bty833