GenBank

GenBank

e-mail us questions, comments or suggestions!

GenBank® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. GenBank is part of the International Nucleotide Sequence Database Collaboration, which comprises the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These three organizations exchange data on a daily basis. Many journals require submission of sequence information to a database prior to publication so that an accession number may appear in the paper. The sequence data submitted by an investigator to any of the three databases appears in all of them and is available to all users.

The amount of information in GenBank has grown exponentially, and in August 2005 it exceeded 100 gigabases (1 x 10¹²). Of these, approximately 90 billion were in GenBank, 10 billion in EMBL and 5 billion in DDBJ. As of February 2006, there are approximately 59,750,386,305 bases in 54,584,635 sequence records in the traditional GenBank divisions and 63,183,065,091 bases in 12,465,546 sequence records in the WGS division.

Access to the databases comes from almost every country in the world, although U.S. companies and U.S. universities are by far the largest users. The number of data requests has increased steadily since 1997 with predictable dips during summer vacation times and especially on Christmas Day (although there were still over 50,000 hits by exceptionally geeky computational biologists in 2000!)

GenBank provides the base sequence, the translated amino acid sequence and references for each record stored in GenBank. In addition information is given about the position of any coding sequences identified within the record, and the putative function of the protein.

GenBank was recently (ca. early 2006) split into 3 divisions, and is now referred to as the Umbrella Nucleotides Data Base. The three divisions of the Umbrella Nucleotides Data Base are: 1.) the Core Nucleotides data base (DNA, mRNA sequences); 2.) the EST data base(expressed sequence tags); 3.) the GSS data base (genomic sequences).

TUTORIAL #1 (15-20 minutes) The GenBank Flat File and its Format

EXERCISE #1 (20-30 minutes)

TUTORIAL #2 (15-20 minutes) Dissection of a GenBank Record

EXERCISE #2 (20-50 minutes)

DO THE FOLLOWING ACTIVITY FOR YOU POSTER

Top of Page

RETURN TO SITE MAP

Back to "Index for ENTREZ and Searches"

Back to "Sequence Data Bases and Formats"