Biocomputing in a Nutshell

Ulf Reimer and Georg Fuellen

A Short View onto the Development of Biology

DNA
The 3-dimensional structure of the DNA. Our homepage features an animation of DNA, and our background image is based on it.

Linus Pauling, the chemist, vitamin C-ist and anti atom-bombist determined the structure of the other type of molecule, the protein molecule - that is chains made up of things called amino acids.

Protein
The 3-dimensional structure of a protein, Beta-amylase. The main structural units of the protein, which are made up of just a few amino acids each, are differently coloured.

This work inspired James Watson and Francis Crick in 1953 to elucidate the structure of DNA - the ABC of all known living matter. To cut a long story short over the next years many people pieced the puzzle together: The building blocks of life are the 20 amino acids that make up proteins; DNA contains the blueprints for these structures in its own structure. It is a long strand made of 4 nucleotides - this is the code of life. It goes ACGTTCCTCCCGGGCTCC, and so on, and so on, and so on. If you know the code you know the structure of all living things, at least in theory.

An animation of Guanine (G), one of the 4 standard nucleotide bases. The colored balls represent the atoms from which it is made. Similar ball-and-stick models can be constructed for the 20 amino acids. (Click here if you'd like to `animate' the Guanine.)

Here is a summary of the relationship between DNA and protein:

From DNA to Protein

An Enourmous Flood of Data

Human Genome Project

⁹


Name of data bank	Type of sequences stored	Number of sequences (1996)
EMBL / GENBANK	Nucleotide sequences	827174
SWISSPROT	Protein sequences	52205
PDB	Protein structures	4525

The growth of one typical data bank is shown in below, the increasing number of sequences in the SWISSPROT data bank as time goes by.

Growth SWISSPROT
Growth of the SWISSPROT data bank.

How can We Analyze the Flood of Data ?

ancestors of organisms
phylogenetic trees
protein structures
protein function

Phylogenetic trees are genealogical trees which are built up with information gained from the comparison of the amino acid sequences of a protein like cytochrome C, sampled from different species. Proteins like Beta-amylase or Hemoglobin cannot be chosen to get the "full picture", that is the full tree, because they don't occur throughout the living matter. Due to Darwinian Evolution, the protein has a slightly different amino acid sequence for each of the species. One phylogenetic tree was created for instance with the sequences of cytochrome C from several plants, animals and fungi. Below, part of this phylogenetic tree is shown.

Drawing of a phylogenetic tree based on the amino acid sequence data of cytocrome C (see inset).

Prediction of protein structure from sequence is one of the most challenging tasks in today's computational biology. More or less, the task is to calculate an image like the one in the second figure of this text. Although most information of 3-dimensional structure is encoded in the amino acid sequence it is still unknown which information controls the process of protein folding. Among millions of possible folding products, proteins take up one working, native structure. Since it is very difficult and expensive to evaluate structures by methods like X-ray diffraction or NMR spectroscopy, there is a big need for the unfailing prediction of 3-dimensional structures of proteins from sequence data. Today there are methods which are able to give a quite reliable result from available sequence data, the odds to get this "right" are about 65%.

Sequence comparison is a very powerful tool in molecular biology, genetics and protein chemistry. Frequently it is unknown for which proteins a new DNA sequence codes or if it codes for any protein at all. If you compare a new coding sequence with all known sequences there is a high probability to find a similiar sequence. Often it is already known which role the protein in the data bank plays in the cell. If you assume that a similar sequence implies a similar function, you now have much more knowledge about your new sequence than before. (See also the contribution by Joelle Thonnard in this volume.)
Proteins of one class often show a few amino acids that always occur at the same positions in the amino acid sequence. By looking for "patterns" you will be able to gain information about the activity of a protein of which only the gene (DNA) is known. Evaluation of such patterns yields information about the architecture of proteins. Often these patterns are involved in active sites, which are the workbenchs of proteins.

What is our task in this field ?

AND

Internet courses

Ulf Reimer

Georg Fuellen

Back to Biocomputing For Everyone WWW Pages