Evolutionary Computation Perl
Evolutionary Computation Perl
Evolutionary Computation Perl
Table of Contents
1. Introduction: an evolutionary algorithm step by step...................................... ?? Introduction to evolutionary computation...................................................... ?? X-Men approach to evolutionary computation .............................................. ?? Here comes sex .................................................................................................... ?? Fish market........................................................................................................... ?? The Canonical Genetic Algorithm .................................................................... ?? 2. Doing Evolutionary Algorithms with Algorithm::Evolutionary ............... ?? Introduction to evolutionary algorithms in Perl ............................................ ?? Canonical GA with Algorithm::Evolutionary ........................................... ?? Growing up: a whole evolutionary algorithm with Algorithm::Evolutionary .................................................................... ?? Extending Algorithm::Evolutionary ........................................................... ?? Frequently asked questions ............................................................................... ?? 3. References ................................................................................................................. ??
iii
iv
Warning
The programs in this tutorial have been tested with Perl 5.6.1.633, as downloaded from ActiveState1 in a Windows 98 system. And yes, I accept condolences for it; I didnt have any Linux machine handy during my holidays. Halfway through writing this tutorial, I downloaded Perl 5.8.0 for CygWin; some examples also work with that version, and I guess the rest should have no problem.
Warning
Download the whole tutorial, in PDF2 and the original DocBook document3
Although Darwins view of forces at work in the evolutionary process seems quite simple, putting them in black on white in an actual algorithm is something completely different. What kind of modications should be applied to the data structures? What do we do with modied data structures? Should we put them in the population, just like that? Should we (gulp), like, off some other member of the population? If so, which one? Historically, the rst solutions to this conundrum seemed to be: create a population of possible solutions, take a member of the population, change it until you obtain something better (from the point of view of how close it is to the solution, that is, its tness), and then eliminate one of the least t members of the population, possible the least t. That way, every time you introduce a new member of the population you are going one step up the evolutionary ladder (just like the X-Men11.) We will work, from now on, on the following problem: evolve a population of ASCII strings so that they are as close as possible to the string yetanother. The tness of each (initially random) string will be the difference between the ASCII values of the letter in each position and the ASCII value of the letter in the target string. Optimal strings will have 0 tness, so the process will try to minimize tness. The job is done by the following Perl program (1stga.pl, whitespace and POD comments have been suppressed):
#Declare variables (1) my $generations = shift || 500; #Which might be enough my $popSize = 100; # No need to keep it variable my $targetString = yetanother; my $strLength = length( $targetString ); my @alphabet = (a..z); (1) my @population; #Declare some subs (not the best place to do it, but we are going to #need them (2) sub fitness ($;$) { my $string = shift; my $distance = 0; for ( 0..($strLength -1)) { $distance += abs( ord( substr( $string, $_, 1)) ord( substr( $targetString, } return $distance; (2) } sub printPopulation { for (@population) { print "$_->{_str} -> $_->{_fitness} \n"; } } (3) sub mutate { my $chromosome = shift; my $mutationPoint = rand( length( $chromosome->{_str})); substr( $chromosome->{_str}, $mutationPoint, 1 ) = $alphabet[( rand( @alphabet))] } #Init population (4) for ( 1..$popSize ) { my $chromosome = { _str => , _fitness => 0 }; for ( 1..$strLength ) { $chromosome->{_str} .= $alphabet[( rand( @alphabet))]; } $chromosome->{_fitness} = fitness( $chromosome->{_str} ); push @population, $chromosome; (4)
(1) This is the initial setup for the evolutionary algorithm; it simply declares a group of variables: the number of generations, that is, the number of iterations that are going to be done if the solution is not found, the size of the population, or the number of different data structures present in the population, and several other constants that will be used through the program. (2) This function computes tness. And yes, it could be done in a single line. (3) This is the only variation operator we are going to use in this iteration of the evolutionary algorithm: it mutates a single charactere in a random position in the string, substituting it by another random character. (4) The population is initialized with random strings; at the same time, the data structure used for chromosomes, which is the conventional name the stuff that evolves is called, is also introduced. From the point of view of evolutionary computation, a chromosome is anything that can be changed (mutated) and evaluated to be assigned a tness, and tness is anything that can be compared. In most cases is a real number, but in some cases it can be something more complicated: a vector of different values, for instance. The data structure used for evolution is really unimportant, although, traditionally, some data structures, such as binary strings, oating-point vectors, or trees, have been used used; in many cases, also, there is a mapping between the data structure evolved and the data structure that is a solution to the problem: for instance, we might want to use binary strings to represent 2D oating point vectors, which are solutions to a numeric optimization problems. All in all, the representation issue has been the origin of endless wars in the eld of evolutionary computation.
Tip: Use the data structure that best suits your expertise, tickles your fancy, or the one that is closest to the problem you want to solve. Testing different data structures for performance will not hurt you, either.
(5) This is the meat of the program, the loop that actually does the evolution. Takes a random element of the population, creates a copy of it, mutates this copy until it nds a new string whose tness is better than the original, which is then inserted in the population eliminating the worst, which probably deserved it.
This program, when run, produces this output: cekyhtvvjh -> 97 mwehwoxscv -> 82 lalmbnbghi -> 81 [More like this...] Best so far: vowjmwwgft => 41 3
There are several problems with this algorithm. First, the population is not really used, and it is not actually needed. It is actually a hill-climbing algorithm, and very ineffective at that, since it takes an element, improves it a bit, puts it back into the population, takes another one... it would be much better to just take a random string, and start to change it until it hits target, right? In any case, since it is using a random mutation, what we are performing is basically random search over the space of all possible strings. Not an easy task, and this is the reason why the solution is usually not found, even given several hundred thousands iterations.
Tip: Blind mutation usually takes you nowhere, and it takes a long time to do so.
This indicates there is something amiss here; even if nature is a blind watchmaker13, it has the help of a very powerful tool: sex. And that is what we will use in the next section.
The exact form of the crossover will obviously depend on the data structure we are using; in some cases it might not even be possible; but the general idea is to combine two or more solutions, so that whatever is good about them mingles, to (maybe) give something even better. Since recombination is blind, the result might be better or not, but it is quite enough that combination yields something better some times for climbing up the evolutionary ladder. Crossover will be moved, along with the other utility functions, to a small module called LittleEA.pm, and takes the following form:
sub crossover { my ($chr1, $chr2) = @_; my $crossoverPoint = int (rand( length( $chr1->{_str}))); my $range = int( rand( length( $chr1->{_str}) - $crossoverPoint + 1)); my $str = $chr1->{_str}; substr( $chr1->{_str}, $crossoverPoint, $range, substr( $chr2->{_str}, $crossoverPoint, $range)); substr( $chr2->{_str}, $crossoverPoint, $range, substr( $str This is a possible implementation of a simple string crossover, with two parents and two offspring. Both parameters are passed by reference, and offspring take the place of parents. , $crossoverPoint, $ra }
Chapter 1. Introduction: an evolutionary algorithm step by step . A crossover point is chosen randomly, and them, a length to swap that cannot be bigger that the total length of both strings. The characters spanned by that range are swapped between the two chromosomes. Since both parents have the same length, it does not matter which parents length is used to generate the random crossover point; obviously, if variable-length strings are used, the minimal length will have to be used; for more complicated data structures, markers, or "hot points", are used sometimes. Crossover is used in the following program (2ndga.pl; some parts have been suppressed for brevity):
require "LittleEA.pm"; my $generations = shift || 500; #Which might be enough my $popSize = 100; # No need to keep it variable my $targetString = yetanother; my $strLength = length( $targetString ); my @alphabet = (a..z); sub fitness ($;$) { my $string = shift; my $distance = 0; for ( 0..($strLength -1)) { $distance += abs( ord( substr( $string, $_, 1)) ord( substr( $targetString, } return $distance; } my @population = initPopulation( $popSize, $strLength, \@alphabet ); printPopulation( \@population); (1) @population = sort { $a->{_fitness} <=> $b->{_fitness} } @population; for ( 1..$generations ) { my $chr1 = $population[ rand( @population)]; my $chr2 = $population[ rand( @population)]; #Generate offspring that is better my $clone1 ={}; my $clone2 ={}; do { $clone1 = { _str => $chr1->{_str}, _fitness => 0 }; $clone2 = { _str => $chr2->{_str}, _fitness => 0 }; mutate( $clone1, \@alphabet ); mutate( $clone2, \@alphabet ); crossover( $clone1, $clone2 ); $clone1->{_fitness} = fitness( $clone1->{_str} ); $clone2->{_fitness} = fitness( $clone2->{_str} ); } until ( ($clone1->{_fitness} < $population[$#population]->{_fitness}) || ($clone2->{_fitness} < $population[$#population]->{_fitness})); if ($clone1->{_fitness} > $population[$#population]->{_fitness}) { $population[$#population]=$clone1; } else { $population[$#population]=$clone1; } @population = sort { $a->{_fitness} <=> $b->{_fitness} } @population; print "Best so far: $population[0]->{_str}\n"; (1) printPopulation( \@population ); last if $population[0]->{_fitness} == 0; }
(1) The main loop is very similar to the rst example, except that now two parents, instead of only one, are generated randomly, then mutated to generate variation, and then crossed over. In this case, new offspring is generated until at least one is better than the worst in the population, which it eventually substitutes. This requisite is a bit weaker than before: in the previous program, a new chromosome was admitted in the population only if it was better than its (single) parent.
The output of this program, after running it typing perl 2ndga.pl 100000 will be something like thishnbsqpgknl -> 97 gaheewieww -> 92 [More like this...]
cwceluxeih kdcseymlot [Strings before crossover] cwceluxeot kdcseymlih
In fact, in most cases, a few thousands evaluations are enough to reach the target string. The tness of the best individual proceeds typically as shown in the gure below: The last two ttest words found before the solution are also shown in the generation they showed up for the rst time. They are at a distance of 2 and 1 from the target string, respectively; in this case, solution was found after around 2100 iterations; with two new individuals generated each generation, that means 4200 evaluations were needed to hit target. Not a bad mark. Not very good either, but not too bad. Summing up, looks like crossover has made a simple random search something something a bit more complicated, which combines information about search space already present in the population to nd better solutions; population allows to keep track of the solutions found so far, and recombination combines them, usually for the better.
Tip: Sex, is after all, important. And, for many, evolutionary computation without crossover cannot really be called that way, but something else: stochastic populationbased search, maybe.
Why does two point crossover work better than single-point crossover? For starters, the former is included by the latter (if the second point is the last character in the chromosome). Besides, it allows, in a single pass, the creation of complicated structures such as 101 from two chromosomes "000" and "111", that would need several applications of the operator and several intermediate with single-point crossover.
Fish market
Still, some incremental improvements can be made on this algorithm. So far, just the very last element in the population, the scum of the land, was eliminated in each iteration. But, very close to it, where others that didnt deserved the bits they were codied into. Ratcheting up evolutionary pressure might allow us to reach the solution a bit faster. Besides, anybody who has seen a National Geographic documentary program or two knows that, very often, only the alpha male, after beating anybody else who dares to move in the pack, gets the chance to pass its genetic material to the next generation; some other less violent animals like peacocks have to boast the best of its feathers to be able to attract peahens (if that term exists, anyways). All in all, while many of the worst die, some others lead a very boring life, because they dont get the chance to mate. These two sad facts of life lead us to the following improvement on the basic evolutionary algorithm:(3rdga.pl; some parts yadda yadda)
#Everything else is the same, except this loop for ( 1..$generations ) { for ( my $i = 0; $i < 10; $i ++ ) { my $chr1 = $population[ rand( $#population/2)]; my $chr2 = $population[ rand( $#population/2)]; #Generate offspring that is better my $clone1 ={}; my $clone2 ={}; do { $clone1 = { _str => $chr1->{_str}, _fitness => 0 }; $clone2 = { _str => $chr2->{_str},
In this case, rst, ten new chromosomes are generated each iteration, one in every iteration of the mutation/crossover loop. This number is completely arbitrary; it corresponds to 10% of the population, which means we are not really introducing a very strong evolutionary pressure. Each time a new chromosome is introduced, population is resorted (and this could probably be done more efciently by just inserting the new member of the population in the corresponding place and shifting back the rest of the array, but I just didnt want to add a few more lines to the listing). So, each generation, the ten worst are eliminated. Besides, the elements that will be combined to take part (if they pass) in the next generation, are selected from the elite rst half of the (sorted by tness) population. That introduces an additional element of efciency: we already know that what is being selected is, at least, above average (above median, actually). In fact, evolution proceeds faster in this case, but it does not become reected in the number of iterations taken. Why? Because it decreases the number of iterations needed before offspring "graduates", that is, before they become better than the last element of the population. Thus, on the whole, this algorithm runs a bit faster, but the number of generations needed to reach target is more or less the same.
Tip: In te same way as cattle breeders have known for a long time, breeding using the best material available actually improves performance of evolutionary algorithms. Use it judiciously.
However, magic numbers such as the "10" and "half population" inserted in that program are not good, even in a Perl program. We can alter that program a bit, making the selective pressure variable, so that we can select the proportion of elements that will be selected for reproduction from the command line, giving the fourth example so far17. With this example we can check what is the effect of reproduction selectivity in the time the algorithm takes to converge to the correct solution. We run it several times, with selectivity ranging from 10% (parents are selected from the best 10% of the population) to 90% (just 10% are considered not suitable for breeding). The effects are plotted in the following gure: Comparison of the evolutionary algorithm for different reproductive selectivity. In green we can see the original line, in which the reproductive pool was the best half of the population. The most elitist selection strategy seems to be the handsup winner, with the rest needing increasing number of evaluations to reach target with decreasing selective pressure. Looks like being selective with the reproductive pool is benecial, but we should not forget we are solving a very simple problem. If we take that to the limit, choosing just the two best (in each generation) to mutate and recombine, we would be impoverishing the genetic pool, even as we would be exploiting what has been 7
Chapter 1. Introduction: an evolutionary algorithm step by step achieved so far. On the other hand, using all the population disregarding tness for mutation and mating explores the space more widely, but we are reducing the search to a random one.
Important: Keeping the balance between exploration and exploitation is one of the most valued skills in evolutionary computation. Too much exploration reduces an evolutionary algorithm to random search, and too much exploitation reduces it to hillclimbing.
The reduction of the diversity is something any practitioner is usually afraid, and is produced by too much exploitation, or, in another terms, by the overuse of a small percentage of individuals in the population. This leads to inbreeding, and, ultimately, to stagnation of the population in a point from which it cannot escape. These points are usually local minima, and are akin to the problems faced by reduced and segmented wildlife populations: its effects, and increased vulnerability to plagues, has been studied, for instance, in the Ngorongoros population of lions. So far, we have used a greedy strategy to select new candidates for inclusion in the population: only when an operation results in something better than the worst in the population, we give it the right to life, and insert in in the new population. However, we already have a mechanism in place for improving the population: use just a part of the population for reproduction, based on its tness, and substitute always the worst. Even if, in each generation, we do not obtain all individuals better than before, it is enough to nd at least a few ones that are better to make the population improve. That is what we do in the 5th example 5thga.pl18. The number of individuals generated in each iteration can be passed in the command line, and the reproductive selectivity can be also altered, as before. Results are plotted in the following gure: (Bad) Comparison among an evolutionary algorithm in which, in each generation, 25 new elements are generated, chosen from the 25 best (pinkish) or 10 best (brown), 50 and 50 (blue) or 10 (light blue), and the previous results (where 10 new elements were renewed every generation). Not using a greedy algorithm to select new individuals, does not make results much worse, even if we take into account that we are not comparing the same thing here, because the actual number of evaluations until one better than before is reached is not measured; the number of evaluations shown. Other than that, the effect of renewing a different proportion of the population depends on how many we have chosen in advance to substitute the eliminated population: if the genetic pool is big, substituting more improves results (green vs dark and light blue lines, 10% and 50% substitution rate); if the genetic pool is small (25%), results look better if more chromosomes are substituted (pink vs brown and red). This might be due to the balance between exploration and exploitation: by generating too many new elements (50% substitution rate) we are moving the balance towards exploration, turning the algorithm into a random search; but if the pool is small (25%), generating too few would shift the balance toward exploitation, going to the verge of inbreeding; in that case, generating more individuals by crossover leads to better results.
Tip: As David Goldberg19 said in Zen and the Art of Genetic Algorithms20, let Nature be your guide. There is no examination board in Nature that decides whats t for being given birth or not; even so, species adapt to their environment along time. Evolutionary algorithms follow this advice.
Important: There are a couple of lessons to be learned from this last example: rst, plain selection by comparison of each new individual with the current generation is enough for improving results each step; second, the balance between reproductive and eliminative selectivity is a tricky thing, and has a big inuence in results. In some case, being very selective for reproduction and renewing a big part of the popula-
(1) Declaration of a class that belongs to the Algorithm::Evolutionary module. This module will be used extensively in the second chapter; so far, if you feel the need, download it from the SF web site23. This module is for creating roulette wheels that are biased with respect to probability: the probability that the "ball" stops at one of its slots is proportional to its probability. (2) The probabilities for the wheel are created, taking into account tness. Since, in this case, lower tness is better, tness has to be inverted to create the 9
Chapter 1. Introduction: an evolutionary algorithm step by step roulette wheel; that way, individuals with lower tness (closest to the target string) will have a higher chance of being selected (3) A Wheel object is created, and used later on to select the individuals that are going to be cloned and reproduced.
The canonical genetic algorithm is the benchmark against any new algorithm will be measured; it performs surprisingly well along a wide range of problems, but, in this case, it is worse than our previous example, as is shown in the next plot: Evolution of the tness of the best individual for the canonical GA. It needs 160 generations (in this case) to reach the optimum, which is worse than the best cases before. Actually, in simple problems, strategies that favor exploitation over exploration sometimes are more successful than the canonical GA, however, this is always useful as a rst approximation. It should be noted also that, unlike previous examples, since the best of the population are not kept from one generation to the next, tness can actually decrease from one generation to the next. The canonical genetic algorithm manages to keep a good balance balanced between exploration and exploitation, which is one of its strong points; this makes it efcient throughout a wide range of problems. However, its efciency can be improved a bit by just keeping a few good members of the population; that way, at least we make sure that the best tness obtained does not decrease. That members will be called the elite, and the mechanism that uses them, elitism. We will introduce that mechanism in the next instance of the example, canonical-ga-elitism.pl, which is substantially similar to the previous one, except that the rst two chromosomes of each generation are preserved for the next. Results obtained are shown in the following plot: A comparison of the evolution of the canonical GA, with and without elitism. Elitism performs better, obtaining the same result in a third as many generation. Surprisingly enough, this improvement comes at little cost; there is no signicant diminution in diversity during the run, maintaining a good pool of different strings all the time (you can use the freqs.pl program to check that, if you feel like it).
Tip: Keeping track of the best-t chromosomes, and reintroducing them at the next generation, improves performance if done wisely; without the cost of a loss of diversity that can degrade performance in more complex problems.
Even if now, the exact data structure that is being evolved is not that important, original genetic algorithms used, mainly for theoretical reasons (respect the sieve of schemas), binary strings. And one of the most widely used benchmarks for binarystring evolutionary algorithms (or simply GAs), is the "count ones" problem, also called "ONEMAX": in this problem, the target string consists of all ones. This problem is solved in the next program (canonical-classical-ga.pl)
(1) use Algorithm::Evolutionary::Wheel; require "LittleEA.pm"; my $numberOfBits = shift || 32; my $popSize = 200; # No need to keep it variable my @population; sub fitness ($;$) { my $indi=shift; my $total = grep( $_ == 1, split(//,$indi )); return $total; } sub mutateBinary {
(1)
10
(1) The rst lines of the program differ a bit: it takes as an argument the number of bits, and the population is bigger. Fitness is also different: the fitness subroutine splits the binary strings to count the number of ones, which is returned. (2) The mutateBinary subroutine is also different: after selecting a position to mutate, it ips the bit in that position. A mutation operator that ips several bits could be thought of, but the same effect is achieved with several applications of the same operator. More complicated mutation operators use "hot spots" to mutate, or evolve the mutation rate, or change the probability of mutation for each locus in the chromosome. Sometimes, these strategies improve performance, some others are not worth the hassle.
As expected, this program nds the solution eventually; it is only shown here for historical reasons. Just by changing the tness function, many problems that admit a binary codication could also be solved, from the MAXSAT optimization problem, to the well-known traveling salesperson problem.
11
Notes
1. http://www.activestate.com 2. evolutionary-computation-perl.pdf 3. evolutionary-computation-perl.xml 4. http://everything2.com/index.pl?node_id=1114530 5. http://geneura.ugr.es/~jmerelo/GenMM 6. http://www.liacs.nl/~jvhemert/eartweb/ 7. http://evonet.dcs.napier.ac.uk/evoweb/resources/books_journals/bjp343.html 8. http://citeseer.nj.nec.com/fang94genetic.html 9. http://www.iit.edu/~elrad/esep.html#esep 10. mailto:jmerelo at geneura.ugr.es 11. http://www.xmenunlimited.com/ 12. 1stga.pl 13. http://www.world-of-dawkins.com/Dawkins/Work/Books/blind.htm 14. LittleEA.pm 15. 2ndga.pl 16. 3rdga.pl 17. 4thga.pl 18. 5thga.pl 19. http://gal4.ge.uiuc.edu/goldberg/d-goldberg.html 20. http://citeseer.nj.nec.com/context/17642/0 21. http://www.amazon.com/exec/obidos/ASIN/0201157675/perltutobyjjmere 22. canonical ga.pl 23. http://sourceforge.net/projects/opeal 24. canonical-ga-elitism.pl 25. freqs.pl 26. canonical-classical-ga.pl
12
13
(1) After a somewhat peculiar declaration of the class (needs to be done this way because it is where it is installed by default, maybe it is a bug), we have to subclass the basic AI::Gene class, rst to create a rendering of the chromosome so that it looks the same as our previous examples, and then to change the basic denition of mutation, which originally used "character classes"; something we dont need here. It needs to change no further, since it uses as basic alphabet the English lowercase alphabet, as we did in our original programs. (2) The data structure used to represent the chromosome is an array-of-arrays, instead of a hash; the rst component of the array contains the chromosome; this tness function takes that chromosome array, and returns tness. The second component of the array will be used for the tness, as will be seen later on. (3) Crossover is also modied according to the new data structure; arrays are used instead of strings. The rest of the program is not highlighted, but has also been modied according to the new data structure. (4) Initializing the chromosome means now creating a new object of the new class MyGene, and then initializing it via the provided AI::Gene::mutate_insert method, that inserts new characters up to the required number. (5) Mutation is performed via the provided AI::Gene::mutate_minor, that changes a single character (given as parameter). The rest of the program is the same as before, except for the specic methods used to print the chromosome. 14
Chapter 2. Doing Evolutionary Algorithms with Algorithm::Evolutionary All in all, some useful code is provided by AI::Gene, but, still, you have to write a substantial part of the program. Besides, in our opinion, functionally, mutation operators are functions applied to chromosomes, not part of the chromosome interface, and, as such, should be considered independent classes. In the way AI::Gene is designed, any new mutation-like operator can be added by subclassing the base class, but it will not be a part of the class, unless you overload one of the existing functions (like mutate_minor). And, nally, it lacks any classes for doing algorithm-level stuff: selection, reproduction, which have to be done by the class client.
Note: AI::Gene can be a good CPAN starting point for evolutionary computation, but it has some way to go to become a complete evolutionary algorithm solution.
There are several other published tools you can use to perform genetic algorithms with Perl. Two of them,AI::GA4 and Algorithm::Genetic5 are simple and straightforward implementations of a genetic algorithm. An article by Todor Zlatanov, Cultured Perl: Genetic algorithms applied with Perl Create your own Darwinian breeding grounds6, describes a system for doing Genetic Programming with Perl, and includes sample code; this article gets the most mentions in the Perl community. A library, MyBeasties7, stands out among the rest. It is a very complex and general implementation of an evolutionary algorithm for any kind of genotype, which, besides, has its own language for describing them and its transformations and evaluations. It features many classes for mutation and recombination, but it lacks classes for higher-level operations, and for implementing different kind of algorithms. Its learning curve is somewhat steep, anyhow.
15
}; my $ez = new Algorithm::Evolutionary::Op::Easy $fitness; my $popSize = 100; my $indiType = BitString; my $indiSize = 32; my $e = new Algorithm::Evolutionary::Experiment $popSize, $indiType, $indiSize, $ez; print $e->asXML(); my $populationRef; do { $populationRef = $e->go(); print "Best so far: ", $populationRef->[0]->asString(), "\n"; } until ($populationRef->[0]->Fitness() == $indiSize); print "Final\n", $e->asXML();
. Easy as breading butter, and twice as tasty, aint it? Well, rst of all, we are not doing the canonical GA, but a steady-state GA that, each generation, substitutes 40% of the population (a default value). The program goes like this: after loading needed classes, we declare the tness function. In Algorithm::Evolutionary, the chromosome can be any data structure, but the actual data structure evolved is always accessible via the Chrom method. In this case, that method returns a string, that is dealt with as before, to return the total number of ones. After the tness declaration, an Algorithm::Evolutionary::Op::Easy algorithm object is created. That object is passed to another Algorithm::Evolutionary::Experiment object, which contains all stuff needed to solve the problem: algorithm, plus population. You can print that experiment object as an XML document, which would look more or less like this:
<ea xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation=ea-alpha.xsd version=0.3> <!-- Serialization of an Experiment object. Generated automatically by Experiment $Revision: 1.2 $ --> <initial> <op name=Easy ><op name=Bitflip rate=1 > <param name=howMany value=1 /> </op> <op name=Crossover rate=1 > <param name=numPoints value=2 /> </op>
<param name=selrate value=0.4 /><code type=eval language=perl> <src><![CDATA[{ my $indi = shift @_; my $total = grep(($_ == 1), split(//, $indi->Chrom, 0)); return $total; }]]> </src> </code> </op> </initial> <!-- Population --><pop> <indi type=BitString > <atom>0</atom> <atom>1</atom> <atom>1</atom> <atom>1</atom </indi> <!-- more indis like this one --> </pop> </ea>
. By default, the "Easy" operator includes Bitip mutation and crossover, each with a rate of 1 (that means that they are applied with the same probability). Each one takes a parameter, with are passed via the tag param. The Easy operator takes also a parameter, and another code section, which is converted to a subroutine of the same name as included in the attribute eval; the language attribute is included for future extensions. The source code within that tag does not look exactly the same as the one above, because it has been de-parsed (using B::Deparse) from the original pointer-to-sub.
16
Chapter 2. Doing Evolutionary Algorithms with Algorithm::Evolutionary This is the kind of stuff that makes Perl unique; having a compiler/decompiler embedded in the same interpreter makes easy to serialize even complicated stuff, as data structures with pointers-to-function. I cant imagine how this could be done in C++, and its probably impossible in Java too (Is it possible in Ruby or Python, I wonder?)
After the initial section, comes the pop section, that includes the components of the initially generated population. Each individual is enclosed by the tag indi, with a type attribute that indicates the class the individual belongs to, and them, one atom for every "atomic" component of the data structure. That XML document can be retrieved back into a program by loading the le into a variable $xml and using this:my $experiment= Algorithm::Evolutionary::Experiment->fromXML( $xml ); . However, as we said, this is not the canonical genetic algorithm. The program that implements it would be use the CanonicalGA class, like the example in ea-ex2.pl, which is exactly the same, except that, instead of declaring an Easy object, we declare a CanonicalGA. This object, besides implementing a canonical GA without elitism, uses QuadXOver, the crossover used before that takes two arguments by reference and returns offspring in the same arguments. The Crossover object takes arguments by value, not modifying them, and returns a single offspring. A different problem might require a different tness function, and probably different type of individuals. The default Algorithm::Evolutionary distribution includes four classes: Vector, for anything vectorial, from strings represented as vectors, through vector of oating point numbers, up to vectors of frobnicated foobars; String, with the BinaryString subclass, and Tree, for doing Genetic Programming or anything else that requires a tree data structure for representation. Using any of these data structures for solving a problem is left as an exercise to the student.
Algorithm::Evolutionary problems can be specied using Perl, but you can use an "universal reader" (ea-ex3.pl)15 to read the description of the algorithm
in XML.
#!perl use strict; use warnings; use Algorithm::Evolutionary::Experiment; my $xmlDoc = join("",<>); my $e = Algorithm::Evolutionary::Experiment->fromXML($xmlDoc); my $populationRef = $e->go(); print "Final\n", $e->asXML();
This reader has been listed here in its entirety, but, however, since the CanonicalGA which has been used in the previous example performs a single generation, it is quite limited, and only takes you so far. That is why we need to implement a whole genetic algorithm using Algorithm::Evolutionary classes (and see how they get reected in the XML document).
Chapter 2. Doing Evolutionary Algorithms with Algorithm::Evolutionary 4. elimination of a part of the population, and 5. checking for termination conditions , many of those parts can be delegated to sub-algorithms.
Algorithm::Evolutionary includes skeleton classes, GeneralGeneration and FullAlgorithm, with pluggable submodules, so that, on the basic schema, you
can mix and match any combination of sub-algorithms. This is how it is done in the next example :ea-ex4.pl:
#!perl use strict; use warnings; use Algorithm::Evolutionary::Experiment; use Algorithm::Evolutionary::Op::Creator; use Algorithm::Evolutionary::Op::Bitflip; use Algorithm::Evolutionary::Op::Crossover; use Algorithm::Evolutionary::Op::RouletteWheel; use Algorithm::Evolutionary::Op::GeneralGeneration; use Algorithm::Evolutionary::Op::DeltaTerm; use Algorithm::Evolutionary::Op::FullAlgorithm; my $numberOfBits = shift || 32; my $popSize = 100; my $fitness = sub { my $indi = shift; my $total = grep( $_ == 1, split(//,$indi->Chrom() )); return $total; }; (1) my $creator = (2) new Algorithm::Evolutionary::Op::Creator( $popSize, BitString, { length => $numberOfBits }); my $selector = new Algorithm::Evolutionary::Op::RouletteWheel $popSize; my $mutation = new Algorithm::Evolutionary::Op::Bitflip; my $crossover = new Algorithm::Evolutionary::Op::Crossover; my $replacementRate = 0.4; #Replacement rate my $generation = new Algorithm::Evolutionary::Op::GeneralGeneration( $fitness, $selector, (2) [$mutation, $crossover], $replacementRate ); (3) my $terminator = new Algorithm::Evolutionary::Op::DeltaTerm $numberOfBits, 0; my $algorithm = new Algorithm::Evolutionary::Op::FullAlgorithm $generation, $termin my $experiment = new Algorithm::Evolutionary::Experiment $creator, $algorithm;(4) print<<EOC; <?xml version="1.0" standalone="no"?> <experiment> EOC print $experiment->asXML(); (4) $experiment->go(); print $experiment->asXML(), "</experiment>";
(1) Declaration of a Creator, which is factory class for algorithm individuals, Bitstrings in this case. This class takes as arguments the number of elements it will produce, the name of the class, and a hash that contains a list of named arguments, that will be passed to the class constructor. In this case, just the length of the chromosome is needed (2) A Generation includes a selector, which decides what elements of the population will be used for reproduction. In this case, RouletteWheel is chosen, the same reproductive method we have seen before: the elements of the population are selected with a probability proportional to its tness. Another option would have been TournamentSelect, which takes a set of several individuals, and select only the best of them. It is a greedier way of selection, and its greediness depends on the number of elements in the tournament: the higher the number, the greedier. Crossover and mutation operators are declared in the next line; in Algorithm::Evolutionary, operators are independent classes, so that you are free to declare and use as many as you want. The operators are passed in an array ref to the Generation, and are selected according to rate: by default, each operator gets a rate of 1, and thus have the same probability. Rate can be changed during runtime, since it is an instance variable. Finally, the $replacementRate rules how many elements of 18
Chapter 2. Doing Evolutionary Algorithms with Algorithm::Evolutionary the population will be substituted each generation. Finally, the object is declared, passing all the previous stuff as arguments. (3) Finally, the full algorithm itself, in all its majesty, is declared. It needs the previously-declared $generation object, plus a termination condition. And then, the two operators that are going to be applied sequentially to the population, the creator and then the algorithm, are used to create an Experiment object. This object takes as an argument operators that will be applied sequentially to the population; any operator whose apply method takes an array as an argument could be passed here. It could be possible, for instance, to use a creator, then an evolutionary algorithm, then a bridge, and then another kind of algorithm, population-based incremental learning, for instance, or simulated annealing, to improve the nal result. This, of course, could be done in a single operator. (4) The algorithm is run, and its initial and nal state is included in a wellformed XML document that is sent to standard output.
The good thing about having this XML output is that you can process it very easily, for instance, to pretty-print nal population, using this XSLT stylesheet17 to obtain this web page18. The XML document can be used for post-algorithmic analysis, for interchange with other evolutionary algorithms, possibly written in other languages, or even for external data representation for parallel and distributed algorithms. For instance, the output of the algorithm can be converted to a combined HTML/SVG (Scalable Vector Graphics)19 document, which can be used for presentation straight away. It could also be imagined a "literate evolutionary computation" application that would mix the output of an evolutionary algorithm, with the description of the classes obtained via pod2xml, to create an XSLT stylesheet that would process output and create a document with output along explanation. This is left as an exercise to the reader.
Tip: XML is cool. Nuff said.
Extending Algorithm::Evolutionary
First, we are going to take advantage of a Perl module, to extend our library with new classes. Theres a wealth of Perl modules out there, and many of them are devoted to working with data structures such as lists or strings. For instance, the Algorithm::Permute class comes in handy to create a permutation operator that acts on strings (included binary strings), as is done next
(1) package Algorithm::Evolutionary::Op::Permutation; use Carp; use Algorithm::Evolutionary::Op::Base; use Algorithm::Permute; our @ISA = qw (Algorithm::Evolutionary::Op::Base); our $APPLIESTO = Algorithm::Evolutionary::Individual::String;(1) our $ARITY = 1; (2) sub new { my $class = shift; my $rate = shift || 1; my $self = Algorithm::Evolutionary::Op::Base::new( Algorithm::Evolutionary::Op::P return $self; (2) } (3) sub create { my $class = shift; my $rate = shift || 1; my $self = { rate => $rate }; bless $self, $class; (3) return $self; (4) }
19
(1) This is the usual introduction to modules, which should be preceded with some POD documentation: description, synopsis, and so on. After declaration of the package name, we declare needed modules:Algorithm::Permute20, a class for fast permutations, and base class for all operators, Algorithm::Evolutionary::Op::Base. Two constants should be dened also for the module: one of them is optional, the $APPLIESTO variable, which states to which individual class it might apply to; this will be used in the apply method, but if it applies to a whole hierarchy, for instance, all subclasses of String, its better to nd out a more sophisticated check; the second one, $ARITY, is used by other objects to nd the number of arguments the apply method needs.
Tip: Do not reinvent the wheel: always look up CPAN when writing operators or individuals; you might nd the right class for the job.
(2) The new method does not do much this time, other than forward object creation to the base class (as all objects should do). An operator just has a rate as default variable; and this one has not got any other. (3) This is equivalent to new, and its a fossil. Do not worry about it, it will probably be eliminated in other versions of the library (4) This is the most important method: is the one that actually does the job, and the one it is called to modify chromosomes. Chromosomes are passed by value, that is why it is cloned, and the result of the modication is returned; this is the way the higher-level classes, such as Algorithm::Evolutionary::Op::Full, expect them to behave that way, but they might do something different for particular algorithms (for instance, Algorithm::Evolutionary::Op::QuadXOver takes both arguments by reference, and is used within the canonical GA). This method creates a permutation object from the chromosome, and permutes it, assigns it back to the created chromosome, and returns it.
Other methods, such as set or asXML, can also be overridden from the base class, but just if you want to do something very specic; the base class versions will do just ne in most cases. Sub-classing an Individual, or creating new kinds of data structures to evolve, is just as simple, and it is left as an exercise to the reader. We will use this class to evolve DNA strings; just as we did before, but, rst, our target string will only be composed of A, C, G and T, and, second, we will have no "distance to char", but overall distance among strings. The exercise is purely academic, but a similar problem is solved when sequence alignments want to be done, only in that cases there are several targets. We will use another module, String::Approx21 to compute approximate distances among strings (ea-ex5.pl).
use Algorithm::Evolutionary::Op::Creator; use Algorithm::Evolutionary::Op::Permutation; use Algorithm::Evolutionary::Op::IncMutation; use Algorithm::Evolutionary::Op::Crossover; use Algorithm::Evolutionary::Op::CanonicalGA; use String::Approx qw( adistr ); my $target =shift || CGATACGTTGCA;
20
my $maxGenerations = shift || 100; my $popSize = shift || 100; my $fitness = sub { my $indi = shift; return 1 - abs ( adistr( $indi->Chrom, $target ) ); }; my $incmutation = new Algorithm::Evolutionary::Op::IncMutation; my $mutation = new Algorithm::Evolutionary::Op::Permutation; my $crossover = new Algorithm::Evolutionary::Op::Crossover; my $ez = new Algorithm::Evolutionary::Op::Easy $fitness, 0.4, [$mutation, $crossover my $indiType = String; my $hash = { length => length( $target ), chars => [A,C,G,T]} ; my $creator = new Algorithm::Evolutionary::Op::Creator( $popSize, String, $hash); my @population = (); $creator->apply( \@population ); my $gen; do { $ez->apply (\@population ); print "Best so far: ", $population[0]->asString(), "\n"; } until ( $population[0]->Chrom eq $target ) || ($gen++ > $maxGenerations) ; print "Final\n", $population[0]->asString();
This program is very similar to previous examples. The only differences are that we use a different kind of chromosome, Individual::String, which uses any alphabet, and that we use several variation operators: Op::IncMutation, which increments a single element in the chromosome by one, taking into account the alphabet (that is, it would cycle A -> C -> G -> T); Op::Permutation, which we just declared. The tness returns the distance between the string and the target string, taking into account the length difference, and the insertions and deletions needed to turn a string into the other. This is a problem, since AA and TA will have the same distance to GA, and there are many mutations which are neutral, leading to no change in tness. Furthermore, strings such as GAAAAA are at a distance of 1 (or 1/divided by total length) from AAAAAG, but a very lucky permutation is needed to turn one into the other. This leads to the fact that, in this case, the evolutionary algorithm does not always nd the solution.
Warning
Combinatorial optimization problems like this one are usually hard for evolutionary algorithms (and for any other search method, for that matter). It always help to have a good knowledge of the problem, and use any other methods available to us to improve search and make the tness landscape less rugged.
Chapter 2. Doing Evolutionary Algorithms with Algorithm::Evolutionary blindfolded and single-handedly, youd better do evolutionary algorithms with Perl than with, say, Fortran 9X, even if this language is able to extract the last drop of performance from your old processor. If you have to learn a new language, plus write an evolutionary algorithm in it, performance does not matter so much. 2. What other kind of cool stuff can you do with evolutionary computation and Perl? Besides the aforementioned GlotBot, theres something very similar, written using Algorithm::Evolutionary and HTML::Mason, available from http://geneura.ugr.es/Mason/EvolveWordsPPSN.html. The evolutionary algorithm was also combined with SOAP::Lite to carry out evolutionary algorithms with distributed population (code is available along with the rst versions of OPEAL). As a degree project, some students of mine used OPEAL to optimize fantasy soccer teams, by optimizing step-by-step the team, or by optimizing the rules used to substitute players from one set of matches to the next. And, nally, using the same library, we optimized the assignment of papers to reviewers for the PPSN 2002 conference24.
Notes
1. http://www.ocf.berkeley.edu/~jkunken/glot-bot/ 2. http://search.cpan.org/author/AJGOUGH/AI-Gene-Sequence-0.21/ 3. cga-ai-gene.pl 4. http://www.skamphausen.de/software/AI/ga.html 5. http://www.perlmonks.org/index.pl?node=Algorithm%3A%3AGenetic 6. http://www-106.ibm.com/developerworks/linux/library/l-genperl/ 7. http://mybeasties.sourceforge.net 8. http://eodev.sourceforge.net 9. http://geneura.ugr.es/~jmerelo/GAJS.html 10. http://sourceforge.net/projects/opeal 11. http://www.xml.com 12. http://listserv.activestate.com/mailman/listinfo/perl-xml 13. ea-ex1.pl 14. ea-ex2.pl 15. ea-ex3.pl 16. ea-ex4.pl 17. nal.xsl 18. ea-ex4.html 19. http://www.w3.org/TR/SVG/ 20. http://search.cpan.org/author/EDPRATOMO/Algorithm-Permute0.04/Permute.pm 21. http://search.cpan.org/author/JHI/String-Approx-3.19/ 22. ea-ex5.pl 23. http://geneura.ugr.es/Mason/EvolveWordsPPSN.html 24. http://ppsn2002.ugr.es
22
Chapter 3. References
There are several books that deal with evolutionary computation thoroughly. The one that is more similar in approach to this tutorial is Genetic Algorithms+Data Structures = Evolution Programs1, by Zbigniew Michalewicz; already in its third edition, it has a practical approach from the beginning in its rst part, and has a second part devoted to applications. Another interesting book is Introduction to Genetic Algorithms2, by Melanie Mitchell; although it devotes too much space to evolving cellular automata, it has a good balance between theory, practice and applications. The Hitchhikers guide to evolutionary computation (HHGTEC)3 is the eld FAQ. Besides the FAQ, it includes lots of resources, links to other web pages, mailing lists, and home pages related to the subject. The main conferences in the eld are GECCO (Genetic and Evolutionary Computation Conference)4 and CEC (Congress on Evolutionary Computation)5, which take place annually, and are big events with lots of people and humongous proceedings, and PPSN, Parallel Problem Solving From Nature6, which takes place biannually (in even years) in Europe, usually in September; it is an smaller event, with around 150-200 attendees. EvoNet7 is the European network for evolutionary computation, a consortium of European university departments and enterprises devoted to the promotion and application of evolutionary computation, in all its forms. Its web site contains all kind of things, from tutorials, to case studies, to lists of places where you can get degrees on evolutionary computation.
Notes
1. http://www.amazon.com/exec/obidos/ASIN/3540606769/perltutobyjjmere 2. http://www.amazon.com/exec/obidos/ASIN/0262631857/perltutobyjjmere 3. http://www.cs.bham.ac.uk/Mirrors/ftp.de.uu.net/EC/clife/www/ 4. http://www.isgec.org 5. http://www.wcci2002.org/cec/call.html 6. http://ppsn2002.ugr.es 7. http://http://evonet.dcs.napier.ac.uk/
23
Chapter 3. References
24