BLAST (Basic
Local Alignment Search Tool) is a set of programs designed to perform similarity
searches on all available sequence data. Scientists frequently use such
searches to gain insight into the function and biological importance of
gene products.
BLAST is an algorithm that searches
for sequences related to a query sequence.
It
does this by performing local alignments
(the alignment of some portion of two sequences) as opposed to global alignments
(the alignment of two sequences over their entire length). Sequences with
"good" alignments are putatively related to each other, and to have similar
structure and function.
The BLAST algorithm comes in a number of flavors, few of which are:
 |
NAME |
QUERY |
SUBJECT |
DATABASE |
ACTION |
BLASTN |
nucleotide |
nucleotide |
nucleotide |
Compares a nucleotide query sequence
against a nucleotide sequence database. |
BLASTP |
protein |
protein |
protein |
Compares an amino acid query sequence
with others stored in protein sequence databases. |
BLASTX |
nucleotide |
protein |
protein |
Compares a nucleotide query sequencetranslated
in all 6 reading frames with amino acid sequences stored in protein
sequence databases. |
TBLASTN |
protein |
translated
nucleotide |
nucleotide |
Compares an amino acid query sequence
with nucleotide sequences translated in all 6 reading frames. |
TBLASTX |
translated nucleotide |
translated nucleotide |
nucleotide |
Compares a nucleotide query sequence
translated
in all 6 reading frames with nucleotide sequences translated
in all 6 reading frames. |
|
It is generally preferable to BLAST
protein sequences rather than nucleotide sequences. this is because proteins
are polymers of 20 amino acids, whereas DNA and RNA are polymers
of only 4 nitrogenous bases. This means that the probability of
an accidental match between two protein sequences is only 1/20; the probability
of an accidental match between two nucleotide sequences is 1/4!
TUTORIAL (20
minutes)
 |
Obtaining
a FASTA formatted amino acid sequence. As a shortcut,
we will use the Entrez
"Gene"
database to quickly access the amino acid sequence of a gene product. The
amino acid sequence also could be obtained by searching protein sequence
databases such as NCBI's Entrez; this process, however, can be more involved
and rather time consuming since it often requires examining and sifting
through several sequence records. |
|
|
Submitting
a query sequence. Once the amino acid sequence
has been obtained in FASTA format, it must be submitted to the BLAST site
using the on-line forms provided by NCBI. |
|
|
Understanding the BLAST Conserved
Domain Search. The first page returned after
a BLAST search is submitted is the "Conserved Domain Search" page. This
page identifies common domains which are shared by other proteins. Domains
are usually 25-250 amino acids long, and have a characteristic structure
and function. Somewhere between 1000-3000 domains have been characterized.
The average protein contains 2-3 domains. |
|
|
Understanding
the BLAST "hit list". BLAST returns a list of
all records in a sequence database which are similar to the query sequence. |
EXERCISE
(20-30 minutes)

|
Additional
BLAST resources.
BLAST
search option guide. |