The RefSeq Database.

e-mail us questions, comments or suggestions!
 
 


The Reference Sequence (RefSeq) project provides sequence data and related information for the scientific community to use as a standard.  These RefSeq database therefore provides standard sequences for gene characterization, mutation analysis, expression studies, and polymorphism discovery. 

The difference between RefSeq and GenBank is that GenBank is an archival repository of sequences submitted by innumerable investigators from all over the world, many of which are essentially the same. As a product of expert curation at NCBI, RefSeq addresses several limitations of GenBank:

  1. The RefSeq database is  non-redundant because it is composed of a single sequence, derived from all the similar sequences in GenBank.

  2.  
  3. Each RefSeq record serves as a reference standard because in principle it is more accurate, and more completely annotated, than any single sequence in GenBank.

  4.  
  5. GenBank sequence records are owned by the original submitter and can not be altered by a third party.  RefSeq sequences are created by NCBI curators from primary sequences submitted to GenBank.

  6.  
  7. Since they are owned by NCBI, RefSeq records can be updated as needed to incorporate additional sequence information, and to update annotation which reflects current knowledge of the corresponding biology.


The RefSeq database contains many different kinds of records, which are distinguished by a prefix to their accession numbers. Some of the more common prefixes, and the types of records to which they are affixed are given below:
 

Format
Description
NM_123456 This prefix indicates that the record contains the sequence of mature RNA transcripts. These are the sequences after exon splicing, which code for proteins.
NP_123456 This prefix indicates that the record contains the amino acid sequence of proteins, derived from translation of the mature RNAs.
NC_123456 This prefix indicates that the record contains the complete genome sequence of organisms, organelles, chromosomes or plasmids. These records provide the sequence of entire genes, including the exons, introns and splice points.
NT_123456 This prefix indicates that the record contains the DNA sequence of a contig which is part of a larger assembly. These records provide the sequence of entire genes, including the exons, introns and splice points.
NR_123456 This prefix indicates that the record contains the DNA sequence of non-coding transcripts such as structural RNAs and transcribed pseudogenes.



 
X Note:  RefSeq flat files are all in the GenBank format!
  • Locus section
  • References section
  • Features Table with Locations and Qualifiers
  • ORIGIN
 


Top of Page
RETURN TO SITE MAP
Back to "Index for ENTREZ and Searches"
Back to "Sequence Data Bases and Formats"