Evolutionary Computation Model Selection Project

 

Dr. James P. Hoffmann
Dr. Daniel E. Bentil
Dr. Osei M. Bonsu
Chris D. Ellingwood  
Department of Mathematics
Department of Mathematics
Department of Plant Biology
& Statistics
& Computer Science
University of Vermont
University of Vermont
Eastern Connecticut State University
Burlington, VT 05405
Burlington, VT 05405
Willimantic, CT 06226

Sponsored by: DOE EPSCoR Computational Biology and USDA Hatch grants


Summary:

We are pursuing a new approach to model selection that combines information theory and evolutionary algorithms (both genetic algorithms and genetic programming) to help us in the building of optimally-specified models of complex biological systems. Our ultimate goal is to better understand causality in these systems. Specifically, we use the measured system behavior to drive the evolution of appropriate mechanistic-based models. To avoid model over fitting we integrate into our fitness function the Akaike Information Criterion to implement the principle of parsimony. We are currently using this approach to model invasive species dynamics and aquatic systems, however we expect this technique can be useful in  modeling many other complex systems.


Site Map:                                                

Overview

Software

Preliminary Results

Publications

Workshop


Overview of our genetic algorithm method:

A community of candidate models is randomly initialized. Some gene loci (model parameters) are effectively turned off (0 indicates an inactive gene), depending on the initialized state of an evolvable internal switch. Therefore, although all models have the same fixed genome length, their structure when evaluated by the fitness function is functionally variable. The observations are divided into two data subsets; a training set used by the fitness function for evaluating the candidate models, and a test set used for model validation.


Software:

We have modified PGAPack, a public domain parallel genetic algorithm library available at Argonne National laboratory, to include a GUI, additional output metrics, 2D and 3D visualization, and other features.

Here are some screenshots of our software interface (click to enlarge):

            

The fitness function and models are coded in C, optimized, and parallelized for SMP (Symmetrical Multiple Processors).


Preliminary Results

Method: We chose a 'correct' model from among the set of possible models, and used that model to generate the “true” data (with or without added Gaussian noise). All of the evolving models’ predictions are compared to the "true" data. Using this synthetic-data approach we tested our modified genetic algorithm on dynamic physiological-ecology models we built that simulate some of the biochemistry and biophysics that occur in a leaf undergoing photosynthesis. Note: the genetic algorithm does not "know" what the correct model is that produced the data; similar to evolution it operates blindly and the most fit models tend to survive and reproduce their structure.
 

A Leaf Photosynthesis Test Model: These experiments used a set of complex models that simulate the physiological ecology of a leaf undergoing photosynthesis, specifically the dynamics of the carbon, water and heat budgets of the leaf over time. Several sub-models simulate the leaf’s response to variation of different environmental factors. Soil water potential, herbivory, and ozone effects are also be included in the models.

The model comprised six ordinary-differential equations that describe the state variables and fluxes. External forcing functions accounted for the influence of light intensity and duration, temperature, humidity and wind velocity, and feedback loops linked the various model subcomponents together. The nonlinearities and interdependencies in the model produced complex behaviors in leaf temperature, heat content, and water and carbon content. Here is the complete model structure depicted as a STELLA diagram:

Effect of Parsimony (P) and Noise (N) on the Success of the GA Evolving the Correct Data-Generating Model.

Treatment→   - P  - N   - P  + N   + P  - N   + P  + N
      0/100     0/100    96/100    93/100

Note: the numerator is the number of correct models evolved and the denominator is the total number of replicate runs.

These preliminary results show that our approach is successful at evolving the correct model structure when the Akaike Information Criterion (+ P) is used to insure parsimonious models, even in the presence of noisy data. Without the Akaike Information Criterion the models that are evolved are mis-specified, and overly-complex incorrect models that overfit the data.


Publications:

Here we describe some of our results with this new approach, both with synthetic test data and real field data of the zebra mussel invasion of Lake Champlain (see #6).

  1. Hoffmann, J.P., Ellingwood, C.D., Bonsu, O.M. and Bentil, D.E. 2002. Turning genes off and on: Using genetic algorithms with complexity-based fitness for model selection in ecology. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2002) – Workshop: Special Session on Biological Applications of Evolutionary Computation, pp 38-40. Available here as pdf
  2. Bentil, D. E., Bonsu O. M., Ellingwood, C. D. and Hoffmann, J. P. 2003. Deterministic uncertainty in population growth. 4th IEEE International Symposium on Uncertainty Modeling and Analysis (ISUMA). IEEE Computer Society Press, Los Alamitos, CA. pp 274 - 278
  3. Hoffmann, J.P., Ellingwood, C.D., Bonsu, O.M. and Bentil, D.E. 2004. Ecological model selection via evolutionary computation and information theory. Invited paper for special issue on biological applications of evolutionary computation. Journal of Genetic Programming and Evolvable Machines 5(2): 229-241.
  4. Osei, B. M., Hoffmann, J. P., Ellingwood, C. D. and Bentil, D.E. 2005. Probabilistic Uncertainity in Population Dynamics. WSEAS Transactions on Biology and Medicine 1(2):51-56.
  5. Hoffmann, J.P. 2005. Darwin and Computational Ecology: How Simple Computational Models of Evolution Help our Search for Better Models of Ecological Systems. Keynote Address: In Proceedings of Open International Conference on Modeling and Simulation - OICMS 2005, Hill, D. R. C., V. Barra., and M. K. Troer (Eds.), Blaise Pascal University, France, pp 27-39.
  6. Hoffmann, J.P. 2006. Simultaneous Inductive and Deductive Modeling of Ecological Systems via Evolutionary Computation and Information Theory. Transactions of the Society for Modeling and Simulation International – Special Issue on Ecological and Environmental Simulation. Simulation 82 (7): 439-450. Available here as a pdf
  7. Bentil, D.E., Osei, M.B., Ellingwood, C.D. and Hoffmann, J.P. 2007. Analysis of a Schnute Postulate-based Unified Growth Model for Model Selection in Evolutionary Computations. BioSystems 87: (In Press).

Workshop

For information on our 2003 Evolutionary Computation workshop with Dr. David Goldberg go here.


Last updated: December 4, 2006