Next: The Goal: Evolution of Up: How to Breed Multicellular Previous: Genetic Programming Contents Index

Subsections

The Object-Oriented Ontogenetic Programming System

The Object-Oriented Ontogenetic Programming System is an ensemble of programs which work together for breeding multicellular programs. The underlying techniques are object-oriented ontogenetic programming for the multicellular programs and genetic programming for the breeding. For coping with the problems of OOOP and GP that are described in sections 2.1.4 and 1.5.2, OOOPS introduces some extensions to GP. First, we will look at the architecture of the system and its functioning. In section 3.2.2, we will then discuss the extensions of OOOPS to the basic GP approach shown in section 3.1.1.

The System

It is impossible to describe OOOPS in all detail here, because it is such a big system that its complete description would fill a whole book of its own. This section tries to give at least a good overview of the main features and workings of the Object-Oriented Ontogenetic Programming System. Because of limited space, there are also several unexplained technical terms which are common knowledge for programmers.

As was already mentioned in section 2.1.4, the Object-Oriented Ontogenetic Programming System is designed such that it can run in parallel, distributed on several computers. For allowing this, OOOPS is organized as a client-server architecture. There are three different programs that have been created to build OOOPS:

The evolution manager (Ooopse). This is the server. It runs on a central computer and has access to the Ooopse database (Ooopsed) which is used for storing data about the current population.
The individual manager (Ooopsi). There are several of those programs which act as clients. They can run on different computers and communicate via a network connection and TCP/IP. They communicate as well with the Ooopse as with each other.
The individual. These are the OOOP programs that are being evolved. One Ooopsi can manage several individuals. The individuals are varied, compiled, run and evaluated by the Ooopsi. For control and information flow between the Ooopsi and the individual, two types of inter-process-communication are used: piping and shared memory.

**Figure 3.3:** Parts of OOOPS and their communication.
$\resizebox* {0.75\columnwidth}{!}{\includegraphics{ooops-modules.eps}}$

Figure 3.3 shows these different parts of OOOPS and their interactions. All parts are described in more detail in the following.

Database (Ooopsed)

Ooopsed is a relational database which stores all important information about the evolutionary run. As database management system, PostgreSQL is used because it is free and has a lot of practical features. Plus, there are many developers and a good documentation. The database has the following structure:

< count: | {int4 indid, int4 geneid, int4 evaluations} >
< gene: | {int4 geneid, float4 genefitness, int4 numcommands, int4 numrequire, int4 numinhibit} >
< ind: | {int4 indid, float4 indfitness, text address} >
< indgene: | {int4 indid, int4 geneid, int4 used} >
< parent: | {int4 indid, int4 numdescs, text code, text results} >

The count table includes counters for the individual ID, the gene ID and the number of evaluations already performed by the system. A new individual ID is needed when an individual dies and is replaced either by the descendant of another individual or by a newly initialized individual. A new gene ID is given to every gene when varied and of course also to newly initialized genes. Individuals only exist once. They have a specified address which consists of the host address, the port number of the Ooopsi and the individual number in the Ooopsi (because there can be more than one individual in an Ooopsi). But the same gene can exist in several individuals because genes are often copied into another individual by conjugation⁸. Therefore, a gene gets a new ID when it is varied to distinguish it from the other copies. Thus, a gene ID clearly denotes one type of gene described in the gene table. And the indgene table contains all the gene copies with their host individuals and the information how often the gene has been executed during the last evaluation of the individual. In contrast, an individual can keep its ID when it is varied because there are no other individuals with the same ID anyway. The numcommands, numrequire and numinhibit fields of the gene table are only used for making statistics about the evolutionary development.

The parent table is needed for storing information about offspring. As OOOPS works asynchronously⁹ and therefore only can vary or select one individual at a time, offspring cannot be inserted into the population at once when an individual is selected for reproduction. A reproducing individual does not die. But a new individual can only replace a died individual and take its address. So the children have to wait in the database (in the parent table) until another individual dies and they can take its place. The indid is the ID of the parent individual, numdescs contains the number of children it is allowed to produce, code is the sourcecode of the parent individual and results contains the evaluation results which made Ooopse select this individual for reproduction. When an individual dies, a random child is chosen from the parent table to take its place.

Evolution manager (Ooopse)

Ooopse is the central server program which controls the evolutionary run by initializing new individuals, taking care of the Ooopsed database, deciding on selection and variation and organizing the reproduction. Ooopse is divided into several modules that take on different parts of the work. As the whole system is implemented in object-oriented manner with the C++ programming language, these modules are realized as different objects. Figure 3.4 gives an overview of the architecture of the Ooopse program.

**Figure 3.4:** The Ooopse program.
$\resizebox* {0.75\columnwidth}{!}{\includegraphics{ooopse.eps}}$

The server() function is the central module which is constantly called by the main() function. It answers requests coming from Ooopsis by forking the Ooopse program¹⁰ and calling functions of the other modules. The server() gets requests from an Ooopsi and answers them through the Socket. More precisely, there are two sockets, a ServerSocket which listens for incoming requests and a ClientSocket with which the Ooopse could request the source code of a specific individual from its hosting Ooopsi. This is not needed for the functioning of the system, but it allows the user at the central computer to get the sourcecode of any individual that is registered in the database.

Ecomm contains all the other functions necessary for communicating with the Ooopsis, for example functions for converting information which is to be sent into a serial format and back to the original datastructures.

UI is meant to be the object controlling the user interface. As there is nearly no user interaction until now, its only function is to save statistical data to the harddisk and draw graphs by sending the data to gnuplot¹¹ through a pipe.

Dcomm is the most important of the Ooopse modules. It not only contains functions for saving the results of individual evaluations to the database, but it also includes all functions that decide on what to do with each single individual. So this module also decides on selection and variation. It makes sense not to construct separate objects for these functions because they must operate directly on the database. All decisions are based on the information contained in the database and the functions have to return information from the database as for example the address of the donor individual for gene conjugation. But except deleting an individual from the database when it dies, Dcomm only decides but does not execute its decisions itself. The decisions are sent to the Ooopsi for execution. Later in this section, we will see more exactly how the different parts of OOOPS work together by looking at the control flow of a typical evolutionary run.

Individual manager (Ooopsi)

The Ooopsi is a client program that contains the sourcecode of some individuals of the population and takes care of everything concerning these individuals. It compiles, runs and evaluates one individual after the other, sends the results to the Ooopse and then executes the commands (e.g. concerning variation) received as an answer. Thereafter, it compiles, runs and evaluates again. The main parts of the Ooopsi program are shown in figure 3.5.

**Figure 3.5:** The Ooopsi program.
$\resizebox* {0.88\columnwidth}{!}{\includegraphics{ooopsi.eps}}$

In the Ooopsi, the main() function not only calls the server() function, but also manages all the other activities. The server() function here is only used for answering requests for individuals or genes that are managed by the Ooopsi. These requests can either come from another Ooopsi (which is the standard for gene conjugation) or from the Ooopse. As in the Ooopse, the server() function uses the Socket object for communication via TCP/IP. It also has access to the Code module where the sourcecodes are stored.

The Code module not only stores the sourcecode of the individuals but it also compiles and varies them based on the instructions received from the Ooopse. It can probably be seen as the most important part of the Ooopsi. It takes care of everything related to the program code of the individuals. Especially complex and important are the different possibilities for variation. They are described separately later in this section.

The Icomm module handles the communication with the individual. So in contrast to the Code object which manages the individual code including the compilation, the Icomm object is in charge of the individual executable. It starts and stops the runs of the individual, sends other instructions (which are mostly problem specific) through a pipe and reads information from the individual through shared memory. This information can for example be the current state of the individual for visualization of its dynamics in a user interface, it can be reactions of the individual which are fed back into the external problem environment managed by the Problem module, or it can be outputs of the individual that are used for evaluating its fitness.

The Problem module is only used for some types of problems. If OOOPS is used for the evolution of distributed intelligence where the problem solution is only the genome of the units and the problem environment is modelled in the individual, the Problem module is not used as there is no external problem environment. But if OOOP is used as a modularization technique and the problem solution is the whole individual, the problem environment is located outside of the individual. In this case, the problem environment will usually be read by the main() function (using the Problem object) which will then feed the environmental inputs into the individual through the Icomm object. These environmental inputs can for example be fed into the individual by varying the message production of special external message sources that can be arbitrarily located in the cell space¹². Like this, usual GP training- and testcases can be fed into the individual¹³.

The Ecomm module does in the case of the Ooopsi not only serve as function package for conversion of message formats used for the communication between the Ooopse and the Ooopsis. Except for the server() function which directly uses the Socket object¹⁴, the Ooopsi does not directly access the Socket but communicates through the Ecomm module. So this module handles the whole communication with the other Ooopsis and the Ooopse except answering gene or individual requests.

Individual

The individual has already been described in section 2.2.

Selection

Selection in OOOPS is carried out by the Ooopse¹⁵. First, the individual is ranked by its fitness¹⁶. Depending on this rank, the individual can be selected to die, in which case it is neither a candidate for reproduction nor (of course) for variation. There is a parameter mortalPercentage that defines which percentage of the population has a death probability greater than zero. The individuals that appertain to the mortal part of the population are those that have the lower fitness ranks. The individual with the lowest fitness rank has a death probability of almost 1¹⁷ and this probability decreases linearly with increasing fitness rank until it reaches 0 at the first individual that belongs to the immortal part of the population.

If the individual does not die, the Dcomm.select() function decides whether the individual is to be varied itself or if it is part of the elite which continues unvaried into the next generation. There is a parameter elitistNum which defines how many individuals belong to the elite. An individual remains unvaried in the population if it ranks among the elitistNum fittest individuals.

At last, the Dcomm.select() function decides whether the individual is allowed to produce a child. The individual's reproduction probability is computed analogous to its death probability but with the difference that the probability is 1 for the fittest individual and it decreases linearly with decreasing fitness rank until the first infertile individual is reached. As every child can only replace a dead individual, for not producing too many children the average reproduction probability should be about the same as the average death probability. Therefore, numFertiles is set equal to numMortals. As described earlier in this section, children are not produced directly but noted in the database until they can replace a died individual. Because the parent will probably not stay unvaried until the child is used, the code of the parent is transferred and saved to the database when it is selected for reproduction. This not only ensures that the child will not be produced from a parent that has already lost its good qualities (or has even died) but it also means that there are some copies of the fittest individuals in the database most of the time. The user can take a look at them at every stage of the evolutionary run without requesting separate transmissions¹⁸.

Variation

Variation in OOOPS is handled both by the Ooopse and by the Ooopsis. The Ooopse function Dcomm.vary() decides on which type of variation is to be executed on the individual and who is the donor for gene or code conjugations. These decisions are then sent to the Ooopsi which follows the instructions by deciding on how exactly to execute the variation types, requesting the needed code parts from the donor and executing the variations. There are three main types of variation (gene conjugation, code conjugation and mutation) and several subtypes (gene deletion, gene insertion, code deletion, code insertion and different types of mutation).

Gene conjugation is an exchange of complete genes between individuals. It is divided into the two subtypes gene deletion and gene insertion¹⁹. As the name implies, gene deletion removes genes from the individual genome. On the basis of chance and probabilities that are adjustable by parameters, Dcomm.vary() decides if gene deletion is to be applied and if yes how many and which genes are to be taken away. Analogously, it decides on gene insertion²⁰ with additionally deciding about the donor individual for the inserted genes.

Code conjugation exchanges parts of the evolvable code of two distinct genes. If the donor gene is part of an individual in another Ooopsi, this gene also has to be requested for transfer from this Ooopsi by the host Ooopsi of the varied individual. Like gene conjugation, code conjugation can be divided into two subtypes code deletion and code insertion. The same reasons for this apply as for the two subtypes of gene conjugation. Code conjugation most closely resembles classical GP crossover, because it can choose arbitrary parts of the code for the mixture. In contrast, gene conjugation differs from classic GP crossover in that the boundaries of the exchanged parts (which correspond to the crossover points) are given by the division of the program into separate genes. These boundaries are not a product of chance in the crossover process, but their content evolves²¹. This is a reason why the division into genes realized in object-oriented ontogenetic programming is a promising technique for finding good building blocks that can be combined by gene conjugation into a better overall program²².

In the first experiments with OOOPS, code conjugation was not yet used. Perhaps, it is not even needed at all²³. One could argue with Kevin J. Lang and Peter J. Angeline [Lang, 1995,Angeline, 1997] that the classical GP crossover is not more than a macromutation operator that does not help in finding and reusing good building blocks but only makes big changes to the code and as such could be an important part of GP in providing big jumps through the search space. But in OOOPS, classical crossover (resp. the corresponding code conjugation) or another form of more explicit macromutation is perhaps not needed because of two reasons:

If there are no children waiting in the parents table of Ooopsed when an individual dies (which periodically seems to happen), a new random individual is being initialized for replacing the died individual. These new individuals provide totally new starting points for the evolutionary search and as such serve the same purpose as a macromutation operator.
As Hassan Masum and Franz Oppacher claim in [Masum and Oppacher, 2001], the use of gene regulation mechanisms for developing the phenotype from the genotype has the effect that small changes in genotype can lead to big but still viable changes in phenotype. This permits macromutations on the phenotypic level without the deleterious effects that often result from macromutations on the genotypic level. As such, this mechanism also provides some sort of (possibly even better) macromutation without the use of crossover.

Mutation in OOOPS is a random variation of the evolvable parts of a single gene. As there are several evolvable sections in a gene, there are also several different types of mutation. The Ooopse only decides which gene is to be mutated and the Ooopsi then chooses the type and if necessary the exact place of mutation and changes the individual code accordingly. First, there is the evolvable code of the gene. A mutation of the evolvable code in the current implementation of OOOPS is simply a deletion of a random single command or the insertion of a single command at a random position. Of course, only commands which are taken from the function set can be inserted. The function set is created depending on the problem that is to be solved. But until now, only a function set suited for linear programs was used because this makes the implementation of code conjugation and mutation much easier compared to the use of tree structures which are common for classical GP²⁴.

But the OOOP paradigm implies some more variable parameters of the genes that do not exist in classical programming but are central determinants of the behaviour of object-oriented ontogenetic programs. These are the message production of the gene (which consists of the message type and the intensity of production) and its execution conditions. In OOOPS, the execution conditions are divided into requirements and inhibitors (which both also consist of a message type and an intensity). Requirements define a concentration of a specific message that must be reached at the cell's position for executing the genetic code. Inhibitors in contrast to this define the concentration of a message that must not be reached if the gene is to be executed. All these parameters can be mutated by exchanging either the type or the intensity by a random new value (within predefined limits)²⁵.

In OOOPS, all types and possibilities of variation have their own probabilities and can all be applied to the same individual if all happen to be chosen. This also enables a sort of macromutations, but the single probabilities should normally be chosen so that in average not much more than one mutation type at a time is applied.

Control flow of a typical evolutionary run

A typical evolutionary run with OOOPS proceeds as follows:

First, the Ooopse server is started. It signals that it is ready to answer requests. Then, you can start as many Ooopsis as you want. The number of Ooopsis and the number of individuals managed by one Ooopsi determine the population size. As you can start a new Ooopsi at every stage of the evolutionary run and every Ooopsi can theoretically be set to manage a different number of individuals, the population size is not necessarily static.
When started, an Ooopsi first contacts the Ooopse for getting an initial set of individuals. These are created by the Ooopse on the basis of an individual template and a random but limited number of initial genes that can contain random values²⁶ for the variable parts.
Then, the main evolutionary loop is entered in the Ooopsi. For doing the first evaluation, it has to compile and start the individuals. For that purpose, the sourcecode of all individuals is put together so that it can be compiled as one program but the different individuals can still be separately executed. This code is then written to a temporary folder on harddisk and compiled by running the gnu C++ compiler on it.
Next, the first individual is started, and depending on how the individuals are to be evaluated, the communication between the individual and the Ooopsi begins. When the Ooopsi has gathered enough information for rating the fitness of the individual, the latter is stopped and evaluated. The results are then sent to the Ooopse.
When the Ooopse receives such a results message containing the individual ID, the individual fitness, the address, the gene IDs and some more statistical information about the genes, it first saves the results to the database. By doing this, the Ooopse puts the first individual into the population. An individual is only part of the population, if it is registered in the database. And it can only be registered in the database by sending evaluation results to the Ooopse. This ensures that there are no individuals without any known fitness in the population. If the individual is already registered in the database, its entries are updated with every results message.
The Dcomm.saveResults() function of the Ooopse not only saves the information contained in the message to the corresponding database fields. It also computes and updates the fitnesses of the genes contained in the individual. Now, we have a credit assignment problem: How do we know what a specific gene has contributed to the fitness of the individual? David E. Moriarty and Risto Miikkulainen have solved a similar problem of determining the fitness of single identifiable neurons in artificial neural networks in [Moriarty and Miikkulainen, 1997] by computing their fitness as the average fitness of the five best networks they participate in. Analogously, the fitness of a gene in OOOPS is computed as the average fitness of all individuals that contain and use this gene.
As a next step, the Ooopse saves the new information to data files readable by the graph drawing program gnuplot and tells this program to redraw the graph for visualizing the development of best and average fitness of the individuals in the population and possibly other statistical data.
After that, the Ooopse starts the Dcomm.select() function on the individual. As we have discussed earlier in this section, this function decides on what is to be done with the individual.
If the Dcomm.select() function has decided to kill the individual, this is then deleted from the database. As children are always varied copies of their parents and a died individual will be replaced by some child waiting in the database, the deletion of an individual is always followed by deciding about what types of variations to apply to the replacing individual.
Also in the case of a surviving individual which has been selected to get varied, the Dcomm.vary() function is called for determining the type of variation and the donor individual as described earlier in this section.
If the individual has been selected to produce offspring, the Ooopse sends a request for the individual's sourcecode to the Ooopsi for saving it to the database. With this information, the parent table is updated to add the new offspring to the children waiting to be born.
At last, the instructions about what to do with the individual are sent back to the Ooopsi which is still waiting for an answer to its message with the evaluation results.
The instructions from the Ooopse are then executed by the Ooopsi. If the individual has been selected to die, the Ooopsi sends a request to the Ooopse for transferring a parent sourcecode from the database and replaces the old individual with this code. Finally, it executes the variations on the individual code as described earlier in this section.
Now, the run continues at step 4 with starting and evaluating the next individual in the Ooopsi.
This continues until the last individual is varied. Then, the run continues at step 3 by compiling the varied set of individuals.
The evolutionary run is stopped either by the user at any stage of the development or by the Ooopse when new results added to the database make that the population meets a prespecified termination criterion.

Innovations for Genetic Programming

OOOPS is a big system with many features of which some have been described in the last section. Now we will take a closer look at some features of OOOPS (or OOOP) that are new to genetic programming or very rarely realized until now. Table 3.1 gives an overview of the most important innovations that will be discussed in the following.

Table 3.1: Innovations for genetic programming.

Innovations of OOOPS for GP

Introduces genes (used as building blocks)

A new modularization technique

Combines steady-state with global selection methods

Uses conjugation instead of crossover

New reproduction management

Dynamic population size

Multilevel evolution

Cooperative coevolution of genes

Metaevolution possible

Homology possible

Introduces genes (used as building blocks)

Genetic programming did get its name from genetics (because of the operators), but until now it did usually not include genes. As we know, genes are the units in the genotype that describe a separate part of the functioning of the cell (a protein). In crossover, genes are normally not split. We also know that genes can be activated or inactivated. Separate parts of the program that have these genetic attributes are introduced by combining genetic programming with OOOP. We already discussed the advantages of using gene regulation in section 2.1.3. But separate genes might also have another advantage for GP that was already mentioned in section 2.1.1: genes are a new approach to modularization of programs.

A new modularization technique

Genes as separate blocks of code that can evolve through mutation and are recombined into different individuals by gene conjugation are a new technique for evolving reusable program modules. One might argue that only by having the ability to migrate into other individuals and influencing their fitness, which in turn influences the spread of the individual (and with that of the gene) in the population, the gene will evolve towards being a good reusable building block. As genes can spread independently in the population through gene conjugation, it is plausible that not only good individuals but also good genes have a higher probability of survival than bad genes. This implicit genetic evaluation and selection is supported in OOOPS by adding an explicit gene fitness and favouring fitter genes for gene insertion. This has already been described in section 3.2.1 and we will also come back to it later in this section.

In genetic programming, introns²⁷ (or substitutions like explicitly defined introns [Nordin et al., 1996]) have been used as emergent borders for good building blocks. They might also play such a role in nature. But nature additionally has other (more important) mechanisms for defining the borders of a gene. Using introns or their substitutions as emergent borders takes much computing resources during execution of the program or for extracting the intron code before execution or for the genetic operators (in the case of explicitly defined introns).

By using explicit separate genes like in OOOP in which the code evolves, we have predefined borders but emergent content. This has many advantages. First, the crossover operator is very simple, because it only has to mix the genes. It does not have to take care of anything else, neither of producing valid programs nor of computing the probabilities for different crossover points. Second, this simplicity of the crossover operator is not only easy to program but also saves computational resources. Third, this allows to very easily identify and transfer the building blocks between several computers for distributed evolution. Fourth, it enables the use of gene regulation whose pros have been extensively discussed in section 2.1.3. I believe that this technique is also more effective in finding good building blocks than previous techniques (among other things because it enables to apply multilevel evolution as discussed later in this section). This remains to be proved.

Compared to other modularization techniques like automatically defined functions and automatically defined macros, the use of genes in OOOPS does not require to predefine static parameters like for example the number of genes. We have already talked about these advantages of OOOP in section 2.1.3.

Combines steady-state with global selection methods

As we have seen in section 3.2.1, the object-oriented ontogenetic programming system is what I call an asynchronous GP system. This means that the GP individuals reach the different states of the evaluation-selection-variation loop not at the same time. Their life-cycles are chronologically independent of each other. The term "steady-state" refers to the fact that in an asynchronous GP system the biggest part of the population usually is not also just being varied when one individual changes.

Until now, steady-state approaches are intimately tied to tournament selection, because this is the only known selection method that does not compare the evaluation results of all individuals in the population. With tournament selection, the individuals taking part in the tournament can be selected independently of the rest of the population. Tournament selection is a good selection method. It has many advantages on the other methods: A central unit for selection is not needed, there is no need for a global fitness function that produces absolute fitness values²⁸, the evaluation and selection takes less computational resources, this selection method is closer to the natural example than the others, etc. But tournament selection also has some disadvantages:

Tournament selection is not truly an asynchronous method. You always have to synchronize several individuals to build a tournament. One might think: "This is not really a problem. You can just always compare the next n individuals that are ready for evaluation if n is the chosen tournament size." But this is not a good solution. It not only requires a central unit to register individuals that are ready for evaluation (which counteracts the advantage of no need for a central selection instance), but it also produces another much more severe problem: If all individuals would run through the evaluation-selection-variation loop at approximately the same speed (which is quite probable to be the case if the individuals are distributed on comparable and not otherwise used computers or even on the same computer), the tournaments would always consist of the same individuals or their descendants. But in order to compensate for the local quality of evaluation in tournament selection, the tournament composition must be a good random mixture of individuals from the population which should strive to be as representative of the whole population as possible and which should vary as much as possible from one evaluation to the next. If the competitors do not mix enough between different tournaments in the course of the evolutionary run, tournament selection will enforce distinct societies of individuals that evolve separately. This is not the goal. But how can you ensure a good random mixture of participants in all tournaments? For choosing random individuals from the population for each tournament (which is a simple, often used and theoretically good answer) you not only need a central unit which has an overview of the population, but you also destroy the advantages of asynchronous GP and waste a lot of time: The individuals always have to wait for each other to get ready for the competition²⁹.
The organization of tournaments in a system with distributed competitors poses some problems. How should distributed individuals be compared? There are two obvious solutions: 1. Use a global fitness function for separately evaluating the individuals and only compare the results via net communication. 2. Let the individuals come together for making a real tournament. The first solution destroys the most important advantage of tournament selection that it does not require a global fitness function, whereas the second solution produces a lot of net traffic and leaves the question unanswered about where to arrange the tournament.
The parameters influencing the functioning and results of tournament selection are not very fine tuneable. Tournament selection usually works as follows: First, n = tournament size individuals are being selected to compete. Then the competition takes place which determines relative fitness values for the individuals. The best $d\leq \frac{n}{2}$ individuals are then allowed to produce children which replace the worst d competitors. This means that in every tournament, d individuals die. With this parameter d you can adjust the selection pressure, which is a very important measure in evolutionary algorithms. If the selection pressure is chosen too high, the genetic variability in the population can quickly get lost and the algorithm will easily get stuck in local fitness optima³⁰ because bad solutions have a very low survival and reproduction probability. If the selection pressure is chosen too low, the evolution advances very slowly because bad solutions have not a much lower survival and reproduction probability than good solutions which makes it difficult for good solutions to spread in the population. The bigger the tournament size, the finer you can adjust the selection pressure³¹. But the bigger the tournament size, the more tournament selection resembles other selection methods and loses its advantages.
Tournament selection produces a lot of net traffic in distributed systems. If the competitors come together for the tournament, nearly all their sourcecodes (which can be very long) have to be transferred through the net. This happens for each tournament and produces a huge individual-traffic. But even if a distributed tournament is arranged which requires only little net communication, tournament selection produces more individual-traffic than other selection methods. This is because it usually includes a relatively high death rate³² and every died individual requires one individual-transfer for the replacing code³³. Moreover, tournament selection includes a relatively high death probability for good individuals. If a tournament for example only consists of relatively good individuals which would normally not be selected to die, still the worse d of them are going to die. When evolving a big population on fast computers or on the internet, the net traffic could become a bottleneck and therefore should be kept at the minimum. Tournament selection is not the right method for reaching this goal.

As we have seen, tournament selection seems to be a rather bad selection method for distributed evolutionary systems. Therefore, it is not used for OOOPS. Instead, I developed a new organization of the GP system, that allows to use all non-tournament selection methods (like fitness-proportional selection, truncation selection or ranking selection) with truly asynchronous execution of the GP run³⁴. This organization required several other innovations for GP of which the two most important are described in the following.

Uses conjugation instead of crossover

As you already know from other sections (e.g. 1.3 and 3.2.1), OOOPS uses conjugation instead of crossover. This decision was made because conjugation only changes one of the two participating individuals which is ideal for asynchronous genetic programming. In the asynchronous approach we always only have one individual which is guaranteed to be at a specific state of the evaluation-selection-variation loop. When this individual is to be varied, with conjugation it can be chosen to get parts of any other registered individual regardless of the state of the last. But if crossover was to be used, the first individual would have to wait for the second to also reach the state of variation so that both would be ready for changes.

This is not the first time that conjugation is proposed for use with evolutionary algorithms. Peter W. H. Smith suggested a form of conjugation in [Smith, 1996] for GAs and GP which consists of copying a part of the donor genome to the identical position in the recipient genome replacing the original recipient code at that position. This conjugation only partly resembles the natural example, because in nature, the transferred genes do usually not replace the original genes but are simply added to the recipient genome.

The OOOPS conjugation follows the natural example in this respect by providing gene insertion without deleting the original code. But it also provides gene deletion for being a good replacement for crossover (we discussed this earlier) and for in average not blowing up the code length. The division of conjugation into insertion and deletion enables a great variability in code length with still (by using the same probabilities) not predetermining any direction of length development.

New reproduction management

The reproduction management of OOOPS has already been explained in section 3.2.1. An individual which is selected to produce offspring is registered in the parent table of the database. When an individual is selected to die, a random individual from this database table is used as parent of the replacing child and deleted from the table. If there is no entry in the table, a newly initialized individual is used as parent. This organization of the reproduction process is not only perfectly adapted to asynchronous distributed evolution (with a central unit), it also as a by-product has the advantage of refreshing the diversity of the population and trying new starting points for the exploration of the search space from time to time by inserting new random initial individual into the population. This only happens if there are no parents in the database, but as the experiments until now suggest, this is periodically the case. It is an interesting question for further research why this seems not to happen with a random frequency but in periodically changing phases of high and low offspring production. Could it perhaps be compared with periodic changes in natural population developments?

Dynamic population size

In section 3.2.1, we also already mentioned that new individuals can always be added to the population by starting new Ooopsis. So the population size can be varied in the course of the evolutionary run if desired. This is a good and important quality for distributing the evolutionary runs on the internet, because new users can always join in and start an Ooopsi on their computer. Yet, distribution of OOOPS on the internet would require some changes in the communication because Ooopsis then regularly change their address and are often unreachable. Consequently, it would make more sense to not let the Ooopsis communicate directly with each other but instead only let them talk to the Ooopse. This means that all individual sourcecodes should also exist in the Ooopsed database³⁵. Moreover, some security features would have to be added.

Multilevel evolution

As mentioned earlier, OOOPS not only determines fitness values for the individuals but also for the genes. Not only the individuals are selected, but also the genes underly a separate selection process which depends on their fitness. Genes are selected for being the parent of inserted genes in gene conjugation depending on their fitness rank. Of course, they are also varied (by mutation and possibly code conjugation). So not only individuals are evolving in the evaluation-selection-variation loop. Also their subunits (genes) follow these steps. This means that we have two levels of evolution in OOOPS. Such multilevel evolution is not unnatural. In fact, we can view many systems on very different levels in nature as evolutionary systems. Elliott Sober writes in [Sober, 1992] about this issue:

...If an object and its offspring resemble each other, the system will evolve, with fitter characteristics increasing in frequency and less-fit traits declining.

This abstract skeleton leaves open what the objects are that participate in a selection process. Darwin thought of them as organisms within a single population. Group selectionists have thought of the objects as groups or species or communities. The objects may also be gametes or strands of DNA, as in the phenomena of meiotic drive and junk DNA.

Outside the biological hierarchy, it is quite possible that cultural objects should change in frequency because they display heritable variation in fitness. If some ideas are more contagious than others, they may spread through the population of thinkers. Evolutionary models of science exploit this idea. Another example is the economic theory of the firm; this describes businesses as prospering or going bankrupt according to their efficiency.

Cooperative coevolution of genes

Genes in OOOPS have their own selection and they also reproduce. But they do not reproduce independently of their hosts because their fitness strictly depends on the individual fitnesses³⁶. As has been mentioned in section 2.1.3, this is an important precondition for cooperation between the genes to emerge. They must capture all effects that their behaviour has on others. So the genes in OOOPS evolve separately, but they evolve to cooperate. This type of evolution is called cooperative coevolution.

Metaevolution possible

All environmental influences on the individual are defined in the individual itself, either with the external message sources that feed the externally modelled environment into the individual or with the internal environment model in the distributed problem solving approach of OOOP³⁷. Also the whole functioning of ontogeny with the virtual space and the organization of diffusion and gene regulation is implemented in the individual. These parts of the individual are normally not evolvable because some are not part of the genome and therefore not really constitute the genotype of the individual (which is the only thing usually varied in artificial evolution) and they all do not correspond to attributes of natural genomes that change during the evolution of natural species³⁸. But because they are part of the OOOP individual, they can also be evolved with OOOPS. A very simple example would be that you initialize half of the population with individuals that update its cells in a prespecified and static order and the other half of the population with individuals that randomly choose the next cell that is to be updated. This difference does not exist in nature, but it can make a great difference in the behaviour of the object-oriented ontogenetic program. In the course of the evolutionary run, the individuals whose update method produces better results will supersede the others. This example is a very simple form of evolution, because the search space only contains two points. But it shows that OOOPS also allows to evolve parameters of the multicellular model itself and not only the genome of the modeled cell cluster. Such an evolution of the model can be viewed as a kind of metaevolution.

Homology possible

Nature uses some tricks to reduce the probability that crossover will have disastrous effects on the individual by destroying or removing important genes. Two important such tricks are speciation and homology. Both limit the possibilities of crossover on different levels. Speciation only allows individuals of similar species to produce offspring together. As crossover only happens with sexual reproduction, this means that crossover only happens between the genomes of similar species (which have similar genomes). Homology describes the fact, that crossover tends to exchange very similar parts of the genetic code between the two genomes which mostly serve the same function. This results in quite small changes and greatly reduces the probability to remove code parts providing important functions from a genome. It is very difficult to model this on the computer for enhancing evolutionary algorithms, because the similarity of function between genome parts cannot easily be measured. Nature apparently also does not measure function but in natural crossover, the similarity of the DNA sequence seems to be a sufficient indication of function for providing the advantages of homology. This usually does not hold in evolutionary computation. There has been some work on trying to model homology for GP either by only defining homologous code as being found at the same position in the genome [D'haeseleer, 1994,Altenberg, 1995] or by introducing a much more sophisticated definition of homology by comparing both structure and function of the code as proposed in [Banzhaf et al., 1998]. Though the results did not yet indicate a great breakthrough, it might be a good idea to try some kind of homology on OOOP.

One approach to homology for OOOP could be to introduce a genealogy for genes. Provided a good datastructure for storing the sequence of all past IDs of a gene, one could easily determine the genealogical distance between any two genes. You only would have to find the last ID common to both lines and add the number of IDs following this common ID including the actual gene IDs. In case of no common ID, the genes would not be related. As a gene changes its ID with every variation, this genealogical distance would provide a very easy measure of structural similarity between two genes³⁹.

In the current implementation of OOOPS, homology makes no sense because insertion and deletion are separate variation operators. Homology can only enhance genetic operators in which a part of a genome is replacing a part of another genome⁴⁰, because only then these two parts can be compared. But the only thing one has to change to enable homology for gene conjugation in OOOPS is to exchange gene insertion and gene deletion with a single gene conjugation operator which replaces a recipient gene by a donor gene. Then you can ensure some kind of homology by only allowing gene conjugation to exchange genes with a genealogical distance below a certain limit.

If you would analogously combine code insertion and code deletion into a single code conjugation, you could even view this genealogically limited code conjugation as a form of speciation for genes. A genealogical distance below the specified limit would mean that the genes are of the same "gene species" and can therefore be recombined.

Next: The Goal: Evolution of Up: How to Breed Multicellular Previous: Genetic Programming Contents Index