Tools
blastn
-outfmts 6 default values
-outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore"
it seemed that pident equals to nident /length.
blastn supported parameters
qseqid Query Seq-id
qgi Query GI
qacc Query accesion
qaccver Query accesion.version
qlen Query sequence length
sseqid Subject Seq-id
sallseqid All subject Seq-id(s), separated by a ';'
sgi Subject GI
sallgi All subject GIs
sacc Subject accession
saccver Subject accession.version
sallacc All subject accessions
slen Subject sequence length
qstart Start of alignment in query
qend End of alignment in query
sstart Start of alignment in subject
send End of alignment in subject
qseq Aligned part of query sequence
sseq Aligned part of subject sequence
evalue Expect value
bitscore Bit score
score Raw score
length Alignment length
pident Percentage of identical matches
nident Number of identical matches
mismatch Number of mismatches
positive Number of positive-scoring matches
gapopen Number of gap openings
gaps Total number of gaps
ppos Percentage of positive-scoring matches
frames Query and subject frames separated by a '/'
qframe Query frame
sframe Subject frame
btop Blast traceback operations (BTOP)
staxids Subject Taxonomy ID(s), separated by a ';'
sscinames Subject Scientific Name(s), separated by a ';'
scomnames Subject Common Name(s), separated by a ';'
sblastnames Subject Blast Name(s), separated by a ';' (in alphabetical order)
sskingdoms Subject Super Kingdom(s), separated by a ';' (in alphabetical order)
stitle Subject Title
salltitles All Subject Title(s), separated by a '<>'
sstrand Subject Strand
qcovs Query Coverage Per Subject
qcovhsp Query Coverage Per HSP
example
blastn -db Sub_database -query query.fa -num_threads 10 -evalue 1e-6 \
-outfmt '6 qseqid sseqid pident nident qlen slen evalue bitscore' -out query_blastn.txt
word_size
Changing the initial word-size can help to find more, but less accurate hits; or to limit the results to almost perfect hits.
Decreasing the word-size will increase the number of detected homologous sequences, but hits can include alignments of higher fragmentation due to gaps and substitutions (example: search for homologous genes between distant species, see also: -task blastn)
Increasing the word-size will give less hits as it requires a longer continuous regions of exact match. If the word-size is chosen to be almost the size of the query, BLAST will search for almost exact matches (example: search for location of gene sequences in the original genome of the gene)
For short sequences, word-size must be less than half the query length, otherwise reliable hits can be missed.
max_target_seqs
The invocation using the parameter ‘-max_target_seqs 1’ simply returns the first good hit found in the database, not the best hit as one would assume. Worse yet, the output produced depends on the order in which the sequences occur in the database.
E-value & Bit-score
- The smaller the E-value, the better the match.
- The higher the bit-score, the better the sequence similarity
-max_hsps
任一query 序列与同一subject的匹配结果的保留个数(e-value排序)。
Descriptions
if one is interested in the descriptions of the matched sequences in particular - useful not just for human interpretation but handy to search for keywords such as an enzyme or organism name. the stitle
will add the titles/descriptions for the matchs.
~/miniconda3/pkgs/blast-2.10.1-pl526he19e7b1_1/bin/blastp -num_threads 10 \
-evalue 1e-5 -outfmt '6 qseqid sseqid pident nident qlen slen evalue stitle' \
-db ath.db -query pep.fa -out blast.out -subject_besthit
Reference (copied from)
https://www.metagenomics.wiki/tools/blast/blastn-output-format-6
https://www.ncbi.nlm.nih.gov/books/NBK279684/#appendices.Options_for_the_commandline_a
The Biostar Handbook: 2nd EditionNidhi Shah, Michael G Nute, Tandy Warnow, Mihai Pop, Misunderstood parameter of NCBI BLAST impacts the correctness of bioinformatics workflows, Bioinformatics, Volume 35, Issue 9, 1 May 2019, Pages 1613–1614, https://doi.org/10.1093/bioinformatics/bty833