Features of the Genome: noncoding DNA


barnes@mail.clarion.edu             Send questions or comments to Dr. Barnes!




 

Non-coding DNA: non-repetitive "junk" DNA
This class of DNA is also known as selfish, ignorant, parasitic and incidental DNA.  It makes up approximately 97% of the human genome. 


  "junk DNA".  A A general term that once encompassed all non-coding DNA when the function was not understood.  Dr. Susumu Ohno, writing in the Brookhaven Symposium on Biology in 1972 in the article "Why So Much ‘Junk DNA" in our Genome?" is credited with originating the term. But his paper was focused "mainly on the fossilized genes, called pseudo genes, that are strewn like tombstones throughout our DNA. As the term caught on in the 1980’s, its meaning was extended to all non- coding sequences, the vast stretches of DNA that are not genes and do not produce proteins" (about 95% of the genome).

In the last 2 decades however, much has been learned and we know now that this "junk" contains many different types of DNA sequences.  "I don't think people take the term very seriously anymore" says Eric Green [NHGRI] whose group is mapping chromosome 7. [B. Kuska "Should Scientists Scrap the Notion of Junk DNA?" JNCI 90(14): 1032-1033 July 15 1998]

Still, the term junk DNA is frequently used incorrectly. Numerous articles in the medical literature use junk and non- coding DNA interchangeably. [B. Kuska "Bring in Da Noise, Bring in Da Junk" JNCI 90(15): 1125-1127 Aug. 5, 1998]

 

introns:  everybody knows what these are!!!

 

UTRs.  The parts of the messenger RNA sequence that do not code for product, i.e. the 5' UNTRANSLATED REGIONS and 3' UNTRANSLATED REGIONS.
http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
 
 


 

Non-coding DNA:  repetitive sequences
 
Non-coding regions make up approximately 97% of the human genome ..........  Repetitive Sequences make up at least 50% of the genome.

Repetitive sequences  are thought to have no direct functions, but they shed light on chromosome structure and dynamics. They hold important clues about evolutionary events, help chart mutation rates, and by seeding DNA rearrangements, they can modify genes and create new ones. They also serve as tools for genetic studies.

The vast majority of repeated sequences in the human genome are derived from transposable elements - sequences like those that form viral genomes - that propagate by inserting fresh copies of themselves in random places in the genome. A full 45% of the human genome derives from such transposable elements. A major surprise of this new global analysis of the human genome is that many components in this diverse array of repeated sequences, traditionally considered to be "junk," appear to have played a beneficial role over the course of human evolution.
[NHGRI "Summary of the Initial Sequencing and Analysis of the Human Genome" press release, Feb. 11, 2001]
http://www.nhgri.nih.gov/NEWS/summary_of_sequence.html

Another important, although much smaller, class of non-coding DNA are the tandem repeats or satellite sequences. These are associated with centromeres and telomeres, so they may have a function, which is so far not understood.



1.  Transposable elements

 
    • Transposons  (move as DNA)
    • Retroposons (Retrotransposons)  (move as RNA intermediates)
      • Viral-like Retroposons (terminal LTRs)
      • Nonviral-like Retroposons  (lack LTRs).
        • LINEs (Long Interspersed Nuclear Elements)
        • SINEs (Short Interspersed Nuclear Elements)
          • Alu family
These are discrete sequences in the genome that are mobile - they are able to transport themselves to other locations within the genome.
  • The mark of a transposable element is that it does not utilize an independent form of element (such as phage or plasmid DNA). It can move itself  alone and independently from one site in the genome to another.
  • Unlike most other processes involved in genome restructuring, transposition does not rely on any relationship between the sequences at the donor and recipient sites !!!!!
  • Transposable elements sometimes incorporate additional sequences, and move them to new sites elsewhere within the same genome. They are therefore an internal counterpart to the vectors that can transport sequences from one genome to another.  They may provide the major source of mutations in the genome by:
    • insertion and interruption of a functional gene
    • up-regulating or down-regulating a gene as it is moved about the genome.

 
    There are 2 types of Transposable Elements:
     
    • Transposons move directly as DNA.  They do not go through an RNA intermediate or reverse transcription.  Instead the enzyme transposase is used to nick the host DNA and insert the transposon DNA into the cut ends.
    • Retroposons (retrotransposons) move through an RNA intermediate which then invades a DNA double helix at some point.  The enzyme reverse transcriptase then copies the retroposon into DNA which is inserted into the DNA genome.  There are 2 sub-types of Retroposons.
      • 1.) viral-like retroposons which have LTRs (long terminal repeats).
        2.) nonviral-like retroposons which lack LTRs

 
Characteristics of Transposable Elements
Transposable Element Length Number/haploid genome* Fraction
Transposons  2-3 Kb ~ 300,000 3%
Viral-like Retroposons (with LTRs) 1-11 Kb ~ 450,000 8%
LINEs (nonviral-like retroposons;  no LTRs) 6-8 Kb ~ 600,000 15-20%
SINEs (nonviral-like retroposons;  no LTRs) < 0.3 Kb ~ 1 - 1,500,000 10-15%
* These estimates vary a great deal depending on the reference!



 
Transposons:    move directly as DNA without going through an RNA intermediate or reverse transcription.  The P element of Drosophila,  and the Ac element and Ds element in maize,  are examples.

In replicative transposition, the element is duplicated during the reaction. One copy remains at the original site, while the other inserts at the new site. So transposition is accompanied by an increase in the number of copies of the transposon.

In nonreplicative transposition, the transposing element moves as a physical entity directly from one site to another, and is conserved. This causes the element to be inserted at the target site and lost from the donor site.

In both processes, the critical enzyme is  transposase  which is used to:

  1. nick the host DNA
  2. insert the transposon DNA into the cut ends.
Note:  this process does not depend upon complimentarity between the genomic and host DNA!!  It is therefore quite different from any mechanisms based on recombination.



Retroposons (retrotransposons): Retroposons (retrotransposons) are related to retroviruses;  their mobility is based on the formation of an RNA intermediate which is then reverse transcribed.  The DNA copies then become integrated at new sites in the genome. Transposition therefore involves an obligatory intermediate of RNA.  A diagnostic feature of retroposons is the generation of short direct repeats of target DNA at the site of an insertion.

The difference between retroviruses and retroposons is that the former have the ability to insert into a host genome and then migrate to other cells.  The retroposon has only the ability to insert into the host genome, although over the course of many generations, this may occur many times.

Retroposons fall into two general classes:    1.) viral-like retroposons;  2.) nonviral-like retroposons.  Both types move through an RNA intermediate which then invades a DNA double helix at some point.  The enzyme reverse transcriptase copies the retroposon into DNA which is inserted into the DNA genome by integrase.

 
  • Members of the viral superfamily.  They contain LTRs and code for reverse transcriptase and integrase activities. They are characterized by ~ 250- to 600 bp long terminal repeats (LTRs) flanking the central protein-coding region as in retroviruses.  Well known examples are the Ty elements in yeast  and copia elements in Drosophila.  The mechanism of transposition is similar to the mechanism of insertion for the retroviruses.
Note:  this process does not depend upon complimentarity between the genomic and host DNA!!  It is therefore quite different from any mechanisms based on recombination.
  • Members of the nonviral superfamily.  These elements lack LTRs.  They may code for reverse transcriptase and integrase activities.
    • LINES (Long Interspersed Nuclear Elements)  are long (average length = 6 500 bp), moderately repetitive (about 10,000 copies). They encode a gene for the enzyme transposase, which is essential for their movement.  LINEs are cDNA copies of functional genes present in the same genome.
    • Processed pseudo- genes are a type of LINE which lacks a functional gene for transposase or any promotors.  They have the properties of an mRNA transcript  (5' UTR;  spliced exons; short AT region derived from polyA tail).  It is thought that these pseudogenes originated by reverse transcription of a functional mRNA which was randomly integrated into the germ cell of an ancient ancestor.  They are probably not pseudogenes which originated by duplication of whole genes because they lack introns, have the remnants of a poly-a tail, and do not have flanking sequences similar to those of the functional gene copies.  Because they are non-functional, they generally contain multiple mutations, which are thought to have accumulated since their mRNAs were integrated into chromosomal DNA.

     
    • SINES (Short Interspersed Nuclear Elements).   These are families of short (150 to 300 bp), moderately repetitive elements of eukaryotes, occurring about 1.000,000 - 1,500,000 times in a genome.   Although they possess external and internal features that suggest that they originated in cellular transcripts (as opposed to retroviral sequences), they do not code for proteins that have transposition functions.  They appear to be "dead" and incapable of transposition.
      SINES appear to be DNA copies of certain tRNA molecules, created presumably by the unintended action of reverse transcriptase during retroviral infection.
      • Alu elements, which occur on average every 6 Kb, are the most numerous SINES in the human genome.  Its short length and high degree of repetition make it comparable to satellite DNA, except that the individual members of the family are dispersed around the genome instead of being confined to tandem clusters.

 

There are few currently active transposons in the human genome, but by contrast several active transposons are known in the mouse genome. This explains the fact that spontaneous mutations caused by LINES insertions occur at a rate of ~3% in mouse, but only 0.1% in man. There appear to be ~10-50 active LINES elements in the human genome. Neither DNA transposons nor retroviral-like retroposons seem to have been active in the human genome for 40-50 million years, but several examples of both are found in the mouse.


 

2. Tandem Repeats:

These regions consist of relatively short consensus sequences which are repeated many times. Rather than being interspersed throughout the genome - like transposons,, the repeats occur one right after another (in tandem).

There are 3 types of tandem repeat elements - based roughly on size - but the terminology is not precisely defined!

  • Satellite DNA is the highly repetitive fraction of the genome consists of multiple tandem copies of very short repeating units.  The length of each repeat unit is ~ 10 - 100 bp, but there may be 1,000 units or more giving a total length of ~ 100 Kb.   These often have unusual properties:
    • They have an unusually high G-C base content ratio, and may therefore be identified as a separate peak from single copy DNA on a density gradient analysis.  This gave rise to the name satellite DNA.
    • They are often associated with inert regions of the chromosomes, and in particular with centromeres.
  • In addition to the satellite sequences, there are shorter stretches of DNA that show similar behavior, called minisatellites.  The length of minisatellite repeating units is ~10-100 bp. made up of 20 - 50 units with a total length of 1-5 Kb.
  • Minisatellites  are also called VNTRs (variable number tandem repeat) regions. Minisatellites undergo the same sort of unequal crossing-over between repeats They are useful in showing a high degree of divergence between individual genomes that can be used for mapping purposes or DNA fingerprints.
  • The name microsatellite is usually used when the length of the repeating unit is <10 bp, but many times they are dinucleotide repeats.  Microsatellites may undergo intrastrand mispairing, when slippage of the polymerase occurs during replication and this leads to expansion of the repeat.


Go to the top of the page.


BACK to 
Introduction to Genomics