Condensed and edited from “SNP Fact Sheet”, at the ORNL web site, “SNPs: Variations on a Theme”, at the NCBI website, and “What is the HapMap?“ at the International HapMap website.

William S. Barnes, Ph.D., Clarion University of Pennsylvania
 

The human genome is composed of approximately 3,000,000,000 base pairs and 25,000 genes.  99.9% of the sequence of the human genome is the same in all 7,000,000,000 people alive on earth today. It is this relatively small number of differences in the genome which account for the phenotypic differences between individual humans.


SNPs.

Most of these differences are changes in a single base. These are called SNPs (pronounced "snips"), an abbreviation for single nucleotide polymorphisms. For example a substitution of T for an A in the DNA sequence AAGGCTAA would change it to ATGGCTAA. For a variation to be considered an SNP, it must occur in at least 1% of the population. 

The following are important facts about SNPs:

  • SNPs account for about 90% of all human genetic variation. 
  • There are approximately 1,400,000 human SNPs as of 2005), which occur, on average, every 100 to 300 bases along the 3-billion-base human genome. 
  • SNPs can occur in both coding (gene) and noncoding regions of the genome. Approximately 200,000 SNPs occur in coding regions (called cSNPs) and the remainder are located in noncoding regions of the genome. 
  • Two of every three SNPs involve the replacement of cytosine (C) with thymine (T). 


SNPs are of great biomedical interest because variations in DNA sequence can have a major impact on how humans respond to disease; environmental insults such as bacteria, viruses, toxins, and chemicals; and drugs and other therapies. 

An example is apolipoprotein E or ApoE , one of the genes associated with Alzheimer's. This gene contains two SNPs that result in three possible alleles for this gene: E2, E3, and E4. Each allele differs by one DNA base, and the protein product of each gene differs by one amino acid. Each individual inherits one maternal copy of ApoE and one paternal copy of ApoE, so there are 6 possible genotypes:

Research has shown that an individual who inherits at least one E4 allele will have a greater chance of getting Alzheimer's. Apparently, the change of one amino acid in the E4 protein alters its structure and function enough to make disease development more likely. Inheriting the E2 allele, on the other hand, seems to indicate that an individual is at lower risk.


Haplotypes.

Even SNPs which have no effect on function, may still be useful as “tags” for the multiple genes associated with such complex diseases as cancer, diabetes, vascular disease, and some forms of mental illness. These associations are difficult to establish with conventional gene-hunting methods because a single altered gene may make only a small contribution to the disease. However, SNPs can be used as markers to locate genes on chromosomes, and this leads to the “International HapMap Project”.

If there are 1 x 107 SNPs distributed across 23 chromosomes, there are approximately 500,000 SNPs / average chromosome. Some SNPs are so close to each other that they will tend to be inherited together as a block. Although the current terminology is slightly ambiguous, for the purposes of this tutorial a set of SNPs which are clustered together at a particular location on a single chromosome will be said to form a haplotype block. Since SNPs are by definition polymorphic, different SNP alleles will result in different haplotypes (variants of a haplotype block). 

For example, suppose:

  • There are 3 closely linked SNPs within a haplotype block. SNP-A has three alleles, SNP-B and SNP-C have two alleles each. This means that in principle there are 12 possible haplotypes for this haplotype block.
  • If this  haplotype block happened to be located in region 5 of chromosome 10, then three different chromosome 10s could be found in the population, which would be distinguished according to which haplotype they carried.


The real situation is more complex than this because there are roughly 500,000 SNPs / average chromosome - so each chromosome would carry many haplotype blocks, with each haplotype block located at a different place on the chromosome


The HapMap Project.

With so many different haplotype blocks - each with several different haplotype blocks - it becomes interesting to know the frequencies of haplotypes within and between different populations. One of the big biology projects in the mid 2000’s is the “International HapMap Project” to map each SNP to a locus on one of the 23 chromosomes, and to locate each one within haplotype blocks of linked loci.

In many parts of our chromosomes, there are only a few haplotypes. In a given population, 55 percent of people may have one haplotype, 30 percent may have another, 8 percent may have a third, and the rest may have a variety of less common haplotypes. The number of these haplotypes is estimated to be about 300,000 to 600,000, which is far fewer than the 10 million common SNPs. This greatly simplifies the task of finding the genes associated with polygenic traits such as cancer, diabetes, vascular disease, and some forms of mental illness. 

For example, consider the task of identifying all the genes associated with high blood pressure. The HapMap will make it possible to compare the haplotypes of individuals who have high blood pressure with those who do not. If people with high blood pressure tend to share a particular haplotype, then genes contributing to the disease might be somewhere within or near that haplotype, and more detailed screening of that region could be done. 
 


 
 

BACK TO  “Other NCBI Data Bases