Next: The Multicellular Program Up: Introduction Previous: Should we try to Contents Index

Subsections

What has been done until now

As mentioned in section 1.4, there are two major research areas in computer science that are strongly inspired by natural evolution and genetics: Artificial Life and Evolutionary Computation. In both areas there exists a body of literature that is already so big that it is impossible to present or even know every idea that has been published until now. This is not something unusual. In nearly all areas of research you have the problem of finding a good compromise between having broad and interdisciplinary knowledge (which is good for getting new ideas) and having complete knowledge about a special subfield (which makes a good expert). This introduction about recent work in Artificial Life and Evolutionary Computation tries to give you an overview with concentration on approaches that are more closely related to the ideas presented in this thesis. In the section about Artificial Life these will mainly be publications about modelling cells and especially multicellularity, as this is the best natural example for the organization of cooperation between many small parts to reach a higher goal. The section about Evolutionary Computation will present some approaches that either use such a cooperation for building problem solutions or evolve solutions that exhibit a cooperative behaviour themselves.

Simulating natural processes: Artificial Life

Simulation and modelling of biological processes has a long history in computer science. But until 1987, there was no common name for this field and it was therefore very difficult to find all the literature that had the same objective. Because of these problems, Christopher G. Langton organized the "First workshop on Artificial Life" which gave a name to this branch of science and made scientists of very different areas get aware of the many similarities in their work and visions.

Like there are many different areas of research in biology, there are also many different approaches to Artificial Life. If you have read everything up to this point, you know at least two very important processes of life on earth: Evolution and Gene Regulation. The latter is one of the key conditions for the development of individual creatures from a single egg cell to the fully grown individual. This development is called ontogeny³³. As we have seen in sections 1.2.3 and 1.3, the working of these two developmental processes (evolution as the development of a species and ontogeny as the development of an individual member of the species) is very different. Even though some developmental stages of an individual have similarities with evolutionary stages of its species³⁴, the functioning of the development is of a totally different kind. While the driving forces for evolution come from outside (variation and selection are mainly an effect of the environment), ontogeny is mostly an effect of the internal information and interactions, only modified by the environment³⁵.

Phylogeny

Evolution is also called phylogeny³⁶. This process is the basis for the whole branch called Evolutionary Computation. Without Darwin's seminal ideas, we would not be able to breed problem solutions. So this is probably by far the most simulated natural mechanism. And the good results of using its workings practically make Evolution be an even better theory. But there are also researchers in Artificial Life that want to find out more about natural phylogeny. Larry Bull has for example examined the initial conditions for the emergence of multicellularity in [Bull, 1997] by comparing the fitness of unicellular and simple multicellular simulated organisms under different conditions.

Multicellularity

A wide and frequently covered area of research is morphogenesis or pattern formation. The main question in these areas is: What are the conditions for the development of forms found in nature? As shapes, structure and patterns in nature are usually built of and by many cells that have grown into this form, the researchers working on these questions also have to model multicellularity.

One very popular approach to doing this is to use Cellular Automata (CA). A cellular automaton is a mathematical model of cells and their interactions originally conceived by Stanislas Ulam and John von Neumann and presented by the latter in [von Neumann, 1966]. Von Neumann wrote:

The question we hope to answer ...is: what are the basic principles which underlie the organization of these elementary parts in living organisms?

The model is very far away from the natural example, but it is a good basis for studying complex interacting systems in general. Moshe Sipper describes cellular automata in [Sipper, 1997] as

...dynamical systems in which space and time are discrete. A cellular automaton consists of an array of cells, each of which can be in one of a finite number of possible states, updated synchronously in discrete time steps, according to a local, identical interaction rule. The state of a cell at the next time step is determined by the current states of a surrounding neighborhood of cells.

Hans Meinhardt for example uses differential equations in [Meinhardt, 1995] for developing many different patterns with striking similarity to patterns found on sea shells and according to [Pfeifer et al., 2000] these can easily be reformulated to cellular automata rules.

Another approach to simulating cells is to construct an Artificial Chemistry. This means that there are rules which determine the reaction of different substances getting into contact and that the substances can diffuse³⁷ into the local environment³⁸. Chikara Furusawa and Kunihiko Kaneko use this method in [Furusawa and Kaneko, 1997] for studying cell differentiation. In [Furusawa and Kaneko, 1998], they simulate multicellular pattern formation with an artificial chemistry and in [Kaneko and Yomo, 2000], Kunihiko Kaneko and Tetsuya Yomo find that differentiation on the basis of the modeled interactions can lead to discrete cell types that are not mixed by recombination. Also Cefn Hoile and Richard Tateson use such a reaction-diffusion system in [Hoile and Tateson, 2000], but with the difference that they use an artificial neural network for producing the reaction output. Hiroaki Kitano and his coworkers present in [Kitano et al., 1997] their Virtual Biology Laboratories, in which they try to simulate all the biochemical processes happening in cells as detailed as possible.

Gene Regulation

There are also different approaches to modelling gene expression networks which work on the basis of the gene regulation mechanisms introduced in section 1.2.3. Most works concentrate on the interactions without modelling diffusion and spatial dimensions or growth. This means that these simulations only model the processes in single cells. Shoudan Liang et al. use boolean networks in [Liang et al., 1998] for inducing gene regulation models in a single cell from observed data. Sean Luke et al. use gene regulation abstracted from the natural example of Drosophila³⁹ for evolving deterministic finite state automata [Luke et al., 1999]. Thorsten Reil examines in [Reil, 1999] the dynamics of gene expression networks on the basis of template matching in a DNA-like sequence and draws interesting conclusions for artificial and biological ontogeny. Jan T. Kim also introduces a formalism for modelling gene regulation networks which is quite close to the natural example [Kim, 2001]. But they all only model the interactions in one single cell.

Peter Eggenberger uses the same principles of template matching in a DNA-like sequence to model gene regulation networks, but he expands this by adding a spatial dimension to it and by simulating multicellular growth processes. He uses this approach both for pattern formation in [Eggenberger and Dravid, 1999] and for morphogenesis of 3D shapes in [Eggenberger, 1997].

Swarm Behaviour

In their important book [Maynard Smith and Szathmáry, 1995] John Maynard Smith and Eörs Szathmáry compare a termite-hill with a well-designed house:

Although the mound resembles a human building in having features ensuring the comfort of its inhabitants, it differs in that not one of its builders had a picture of the completed structure before building started. The structure emerged from the rule-governed behaviour of tens of thousands of interacting workers. In this, the mound resembles a human body rather than a human building. The body is build by the rule-governed actions of millions of cells. Nowhere is there anything resembling a blueprint of the body. At most, the genome is a set of instructions for making a body: it is not a description of a body.

The resemblance between the development of an insect colony and of an organism has led to the concept of a 'superorganism'. The analogy has some value.

So according to John Maynard Smith, swarms of ants, termites or bees can in many respects be compared to multicellular organisms. Jesper Hoffmeyer also compares the body of multicellular creatures with animal swarms, only inversely. He writes in [Hoffmeyer, 1997]:

The body can be understood as a swarm of cells and tissues which, unlike the swarms of bees or ants, stick relatively firmly together.

There has been quite a lot of work on simulating swarm behaviour by modelling mobile reactive agents⁴⁰ that interact in a simulated environment. Jean-Louis Deneubourg and Simon Goss found an explanation for the fact that ants find the shortest path between food sources and the nest and prefer nearby located food by simulating their interactive behaviour [Deneubourg and Goss, 1989]. The results have even been used to solve computational problems such as the Traveling Salesman Problem [Colorni et al., 1992]. Craig Reynolds simulated flocking birds [Reynolds, 1987] and Maja Mataric has done a lot of work on constructing socially behaving robots [Mataric, 1995,Mataric, 1997]. More complex agent-based models with agents that have internal states are widely used for all sorts of simulations including economic models, traffic simulations, ecological simulations and simulation games such as SimCity [Hiebeler, 1994]. But the behavioural rules for the individuals are nearly always formulated manually. There are very few attempts to finding good rules by letting them evolve automatically. We will get back to this in the next section.

Breeding problem solutions: Evolutionary Computation

The use of evolutionary principles for breeding solutions to computational problems instead of solving them analytically already started in the late 1950s with works of R. M. Friedberg [Friedberg, 1958]. But the application and invention of approaches that are still used today started in the 1960s.

Evolutionary Strategies and Evolutionary Programming

The first two important methods were Evolutionary Programming (EP), where finite state automata are being evolved, invented by Lawrence Fogel [Fogel et al., 1965] and Evolutionary Strategies (ES), which are numerical optimizations on the basis of evolutionary principles pioneered by Ingo Rechenberg and Hans-Paul Schwefel [Schwefel, 1965,Rechenberg, 1970]. Evolutionary strategies are together with the next development--Genetic Algorithms--the best known and until now the most widely used Evolutionary Algorithms⁴¹ (EA).

Genetic Algorithms

Genetic algorithms (GA) were invented by John Holland in the early 1970s [Holland, 1975]. The main difference to evolutionary programming and evolutionary strategies is that the variations are not performed directly on the problem solution. There is a (often binary) representation of the solution which is varied and mapped onto the real solution by some function. The real solution then is tested on the problem to get the fitness of the individual. The used function is called the genotype-phenotype mapping because the representation corresponds to the natural genome (the genotype) and the real problem solution corresponds to the living being (the phenotype) which develops on the basis of the genome. This difference to Evolutionary Strategies is very important because on the one hand, it makes the representation be shorter and more powerful in expressing different solutions, but on the other hand, the differentiation between genotype and phenotype usually makes the search for a good solution less powerful. This happens because depending on the mapping function a small variation of the genotype can lead to a big variation in phenotype and the inverse. In evolutionary strategies, the size of the variational steps is reduced with the solution getting better. This makes it possible to approach the optimum quickly and then find it quite exactly without jumping around it with too big steps. This normally is not possible in genetic algorithms. But not all problem solutions can be represented in real numbers. For many problems it is much easier to use genetic algorithms. William E. Hart et al. examine the possibilities and impacts of developmental mechanisms (including not only the genotype-phenotype mapping but also learning which denotes an adaptive quality that remains in the problem solution) in genetic algorithms in [Hart et al., 1995].

An example for the use of genetic algorithms with cellular automata is described by Moshe Sipper in [Sipper, 1997]. He uses the genetic algorithm technique for evolving good reaction rules for the automaton cells and calls this method Cellular Programming. This is a first approach to breeding behavioural rules for many small units in order to enable them to cooperate for reaching a higher goal. But it is very limited in its usability as well because of the simplicity of cellular automata (uses regularly placed, synchronously updated, reliable units without internal memory and with quite restricted communication) as because of the limitations of the genetic algorithm (compared to Genetic Programming which is introduced later in this section).

Peter Eggenberger used genetic algorithms together with gene regulation processes for evolving artificial neural networks for given problems [Eggenberger, 1996]. Jens C. Astor and Christoph Adami independently describe a similar combination in [Astor and Adami, 1998].

Genetic Programming

Genetic Programming (GP) is the most recent important new model for evolutionary computation. Even though already Alan Turing thought about breeding computer programs in 1950 [Turing, 1950], it was not really done until John Koza published his important book [Koza, 1992]. Since then, the genetic programming community grows rapidly. The main advantages of GP are the following:

GP does not impose any fixed length of the solution. Computer programs evolved by GP have variable length, so that the algorithm can find the size necessary for a solution itself. In principle, the maximal length can be extended up to the hardware limits.
GP does not require as much knowledge about the problem and the possible solutions (domain knowledge) as do GAs. This holds firstly because of the advantage mentioned above and secondly because it is not necessary in GP to decide on a genotype representation and a genotype-phenotype mapping function. These decisions are crucial for the power of GAs in solving a problem and they usually require a lot of domain knowledge.
As computer programs are very powerful in describing computations and computational behaviour, so is GP. Using GP, you can theoretically evolve any series of actions a computer can possibly do, provided that you give the GP algorithm a set of commands to choose from that can describe all possible actions. This set of commands is called the function set. It is normally quite easy to put together a powerful function set and to add or remove functions from it if they are (not) needed.

Of course, genetic programming also has several disadvantages (if not, why would there still be people using something else):

The number of possible programs that can be constructed by the algorithm (called the search space) is immense. This is one of the main reasons why people thought that it would be impossible to find programs that are good solutions to a given problem. But this problem is not as big as it seems, because there are also an enormous number of ways to construct a solution. You can construct already a nearly infinite number of solutions by simply adding commands that do not have any influence on the result. This shows that it is very improbable that GP finds the best solution, but it is nevertheless able to find a good solution. This does not only hold for GP but for all evolutionary algorithms. Many of ES and most of EP, GA and GP are heuristic algorithms, where it cannot be granted that they find a solution of predefined quality (e.g. the best), but which normally find very good solutions.
Depending on the programming language that is evolved by the GP algorithm, the process of finding a good solution can take a very long time. GP that uses machine code is usually very fast⁴². But if a high level language is used that must be compiled and creates big programs (e.g. C++), it can get very slow. On the other hand, high level languages have many advantages: You have a much bigger choice of specialized commands for the function set which can make the search space smaller⁴³. It is much easier to start with already known good solutions for optimizing them. And the resulting programs can be easier to understand for humans. Moreover, as will be shown in section 2.1.2, high level languages provide functions and methods that make it much easier to evolve complex programs (e.g. by using classes and objects).
Even though the variations in GP are usually performed directly on the problem solution (the program code), it has the same disadvantage compared to ES as have GAs. A very small variation (e.g. changing one command) can have huge effects on the quality of the solution. In GP this problem is normally even more serious than in GA. There is a high probability that even a very small variation has a disastrous effect on fitness or that a variation has no effect at all. This can be described graphically by saying that the fitness landscape⁴⁴ is very rugged. It is a very serious problem, because the main reason for evolutionary algorithms being better than random search is that better solutions are selected and better solutions have a higher probability of producing better offspring (which means that variations of good solutions are usually better than variations of bad solutions). Fortunately, this still holds for GP, but the rugged nature of the fitness landscape makes it much more difficult to find the optimum.

There is a vast body of literature on genetic programming, but there are very few approaches until now that have tried to use it for optimizing the cooperation of many small entities. On the one hand, this is astonishing because the organization of cooperation in such a swarm of independently acting units is a very hard problem which cannot easily be solved analytically. So why not try evolution on it? But on the other hand, because it is such a hard problem, it could easily be too complex for current EC techniques. We will see later, that the here described new communication and control paradigm is perhaps a solution that makes the problem get in reach for evolutionary computation even for complex environments and behaviours. But we will first look at some other publications that have taken steps towards evolving cooperation with genetic programming.

Thomas Haynes et al., who come from Multiagent Systems⁴⁵ (MAS) research, use strongly typed genetic programming⁴⁶ in [Haynes et al., 1995] for evolving behavioural strategies for the agents in a multiagent domain. As testing domain they use the predator-prey pursuit problem, which is well known in the MAS research community. Their agents do not communicate and they all follow the same evolved rules. Sean Luke and Lee Spector use a very different and quite interesting approach for evolving teams of agents that follow different rules [Luke and Spector, 1996]. They use Automatically Defined Functions⁴⁷ and Automatically Defined Macros⁴⁸ for evolving a whole team of different agents as one GP individual. Every function or macro incorporates the behavioural rules for a different agent.

The disadvantages of all the mentioned approaches are that the resulting behavioural strategies are not very flexible in adapting to dynamic environments, there is no or very limited communication between the acting units, and the resulting agents are either homogeneous [Haynes et al., 1995] or heterogeneous [Luke and Spector, 1996,Sipper, 1997], but cannot change their strategies while cooperating, for example if more agents for a special function are needed. This does not allow for evolving high level multiagent systems⁴⁹ like the ones that have until now only handcrafted behaviours.

As far as we know, nature uses a different approach. As we have seen in sections 1.2.1 through 1.2.3, all cells contain the same control program, the genome. But nevertheless, there exists a very complex interaction between the cells, which differentiate to many different types and perform very different functions. The basis of this--gene regulation--seems to be a solution to the problem how to evolve a team of many heterogeneous agents. Moreover, this mechanism implies a totally new kind of communication which is powerful even though it is very simple concerning the message structure. The use of gene regulation has many advantages, but we will come back to that later.

This is not the first work that stresses the power and importance of gene regulation for evolutionary computation. Sean Luke, Shugo Hamahashi and Hiroaki Kitano wrote in [Luke et al., 1999]:

While evolutionary computation has drawn some inspiration from modern genetics, by and large the inspiration for this field has been aging Mendelian ideas ...Even Melanie Mitchell's highly-regarded GA text [Mitchell, 1996] introduces biological genes thus: "A chromosome can be conceptually divided into genes--functional blocks of DNA, each of which encodes a particular protein. Very roughly, one can think of a gene as encoding a trait, such as eye color. The different possible 'settings' for a trait (e.g., blue, brown, hazel) are called alleles." This is not how it actually works ...Genes form a complex network of interrelationships and regulation which, when seeded with initial chemical concentrations distributed throughout cells, results in miniature chemical machines ...In the rest of this paper, we begin by describing our work in modelling an interesting area of developmental genetics, gene regulation, which we think has particular utility to many new domains in evolutionary computation.

Also Hassan Masum and Franz Oppacher saw the power of gene regulation for evolutionary computation. They wrote in [Masum and Oppacher, 2001]:

We focus particularly on regulatory networks and embryogenesis, as these provide a mechanism that is key in developmental biology and genetics but has not been substantively used to date in EC.

Seemingly, these works were not enough to make many researchers take notice of the powerful mechanisms that are yet unexplored. The above cited researchers saw the power for EC, but as they mainly work in Artificial Life research, they seemingly did not continue with the idea and did not yet produce systems capable of using that power for real applications. As we will see in section 2.1.1, my ideas for Object-Oriented Ontogenetic Programming and the use of gene regulation for evolutionary computation also developed out of studies in Artificial Life that modeled these processes and out of the resulting question why nobody uses this powerful mechanism for solving practical problems until now. I hope, that this work at last can make more people recognize the possibilities of gene regulation mechanisms for automatic problem solving and adaptive systems design and that it makes a whole new area of research begin to grow: Evolution of Distributed Intelligence⁵⁰ (EDI).

Next: The Multicellular Program Up: Introduction Previous: Should we try to Contents Index