BLAST

Sequence similarity searching using NCBI BLAST

Send us comments or corrections!

Adapted and Modified from the U.S. Department of Energy (DOE), "Gene Gateway" at Oak Ridge National Laboratory.

William S. Barnes, Ph.D.
Clarion University of Pennsylvania

INTRODUCTION

Tutorial

Practice Exercise

Poster Activity

Additional BLAST Resources

BLAST Search Options Guide

INTRODUCTION

BLAST (Basic Local Alignment Search Tool) is a set of programs designed to perform similarity searches on all available sequence data. Scientists frequently use such searches to gain insight into the function and biological importance of gene products.
BLAST is an algorithm that searches for sequences related to a query sequence. It does this by performing local alignments (the alignment of some portion of two sequences) as opposed to global alignments (the alignment of two sequences over their entire length). Sequences with "good" alignments are putatively related to each other, and to have similar structure and function.
The BLAST algorithm comes in a number of flavors, few of which are:

NAME QUERY SUBJECT DATABASE ACTION

BLASTN nucleotide nucleotide nucleotide Compares a nucleotide query sequence against a nucleotide sequence database.

BLASTP protein protein protein Compares an amino acid query sequence with others stored in protein sequence databases.

BLASTX nucleotide protein protein Compares a nucleotide query sequence translated in all 6 reading frames with amino acid sequences stored in protein sequence databases.

TBLASTN protein translated
nucleotide nucleotide Compares an amino acid query sequence with nucleotide sequences translated in all 6 reading frames.

TBLASTX translated nucleotide translated nucleotide nucleotide Compares a nucleotide query sequence translated in all 6 reading frames with nucleotide sequences translated in all 6 reading frames.

It is generally preferable to BLAST protein sequences rather than nucleotide sequences. this is because proteins are polymers of 20 amino acids, whereas DNA and RNA are polymers of only 4 nitrogenous bases. This means that the probability of an accidental match between two protein sequences is only 1/20; the probability of an accidental match between two nucleotide sequences is 1/4!

TUTORIAL (20 minutes)

Obtaining a FASTA formatted amino acid sequence.    As a shortcut, we will use the Entrez "Gene" database to quickly access the amino acid sequence of a gene product. The amino acid sequence also could be obtained by searching protein sequence databases such as NCBI's Entrez; this process, however, can be more involved and rather time consuming since it often requires examining and sifting through several sequence records.

Submitting a query sequence.    Once the amino acid sequence has been obtained in FASTA format, it must be submitted to the BLAST site using the on-line forms provided by NCBI.

Understanding the BLAST Conserved Domain Search.    The first page returned after a BLAST search is submitted is the "Conserved Domain Search" page. This page identifies common domains which are shared by other proteins. Domains are usually 25-250 amino acids long, and have a characteristic structure and function. Somewhere between 1000-3000 domains have been characterized. The average protein contains 2-3 domains.

Understanding the BLAST "hit list".    BLAST returns a list of all records in a sequence database which are similar to the query sequence.

EXERCISE   (20-30 minutes)

DO THE ACTIVITY FOR YOUR POSTER

Additional BLAST resources.
BLAST search option guide.

Top of Page
Back to "Index for ENTREZ and Data Base Searches"
RETURN TO SITE MAP