DevMaster.net  
Home | Forums | 3D Engines Database | Wiki | Articles/Tutorials | Game Dev Jobs | IRC Chat Network | Contact Us

Artificial Intelligence > Genetic Algorithms


Genetic Algorithms: Explanation and Implementation Tradeoffs      
19/05/2004

Introduction

The class of stochastic search strategies commonly classified by the name genetic algorithms provides a very powerful and general mechanism for searching complex solution spaces. Genetic algorithms find application in many NP-Hard and NP-Complete optimizations problems. Essentially these problems do not solve nicely using strictly analytic methods and therefore search strategies are typically applied.

In context, for example, deriving the connection weights in a path finding neural network would qualify as a complex solution space for non-trivial neural networks.

There is a certain amount of vocabulary which must be accepted to make the following discussion concise yet understandable, please forgive. I have supplied brief definitions of these terms in the Definitions section. For more formal definitions consult [1] and [2].

Overview

Genetic algorithms (GA) mimic the biological processes underlying classic Darwinian evolution in order to find solutions to optimization or classification problems. GA implementations utilize a population of candidate solutions (or chromosomes). Each chromosome in the current generation is evaluated using a fitness function and ranked. From the ranking candidates are selected from which the next generation is created. The process repeats until either the number of iterations is exceeded or an acceptable solution is found.

Details of implementation are varied but a generic view of a GA would include:

  1. Initialization of the initial population (either randomly or from a best guess or previous partial solution)
  2. Evaluation of the fitness function on each chromosome to determine ranking.
  3. Application of the selection method on the population to determine mating rights.
  4. Application of the genetic operators on the chromosomes selected for mating.
  5. Return to Step #2.

The fitness function is a measure of how well a candidate solves the problem. Examples of an optimization problem would be find the minimum/maximum y given y=F(x), with y=F(x) being the fitness function.

Implementations vary in the choice and practice of the selection method; suffice to say that the purpose of the selection method is to choose candidates whose genetic mix will tend to lead to improved candidate solutions in the next generation. Examples of common selection methods include random, elitist, roulette wheel, tournament, etc.

Genetic operators provide mixing of chromosome portions from the parent or parents to form the offspring of the next generation. Examples of genetic operators include crossover, mutation, inversion, etc.

In the following paragraphs I deliberately describe the steps in reverse order. It is my feeling that, for good software design, especially performance critical software, it is important to grasp the low level details of the larger processes. Generally the design of the basic chromosome structures have more performance impact than the fitness function. You are usually stuck with the fitness function while the organization of the chromosome effects the performance of the genetic operators. This is a broad statement which should be qualified by your particular application constraints.

Chromosome Implementation

One of the first issues that must be resolved when crafting your GA engine is the form of the basic chromosome. This determines the requirements and complexity of your genetic operators and directly effects the performance of those operators, which in turn drives one vector of the over all performance of the GA engine.

In part, your chromosome implementation will be driven by the type of problem you are solving. Is it number based; integer or real, what are the precision requirements? Is it symbolic; arbitrary symbol length or fixed, encoded or pre-decoded? etc.

Convenient Range Values

Consider a problem to find the minimum value of y for the following equation:

y = 100(x12– x22)2 + (1 – x1)2    for -2047 ≤ xi ≤ +2048

This would be the fitness function for the problem. A possible representation of the x values would be two 12 bit binary values concatenated, giving a chromosome similar to:

AAAAAAAAAAAABBBBBBBBBBBB

While this seems like a very straight forward encoding scheme there are subtle aspects to this that make a straight forward encoding not necessarily the best solution. For statistical reasons it is often more advantageous to select a binary encoding where adjacent values have a Hamming distance of 1. Consider the difference between the encoded value of 1023 and 1024:

1023: 001111111111
1024: 010000000000

The straight forward binary encoding causes all 12 bits of the encoding to flip for a single increment of xi at the 1023/1024 boundary. This is the worst case but similar discontinuities exist at every power of two.

Informally, think of the GA as making iterative and somewhat random changes to the encoded values of candidate solutions through the genetic operator process. However due to the encoding scheme incremental improvements to the real value of xi require drastic changes to the encoded form along the boundaries of integer powers of two.

I will make the unsupported claim that these discontinuities in encoding make it more difficult for the GA to converge and point the reader to [3] for a derivation of the mathematics to support this assertion.

An encoding which has a Hamming distance of 1 between adjacent values is often a better implementation. Grey code is one example.

More Interesting Range Values

But suppose your xi range is not conveniently represented as a binary value, perhaps

-6.283 ≤ xi ≤ +6.283

Now you have a real decision to make concerning representation. Depending on the capabilities of the hardware which runs your GA engine it may be preferable to implement a fixed point representation of x rather than a floating point, or in some cases a look up table may make more sense if even fixed point is not suitable.

In the case of fixed point three bits are required for the integer portion and 9 bits are required for the fractional portion. So your chromosome may look something like:

SaAAAaaaaaaaaaSbBBBbbbbbbbbb

Sa Sb represent the sign bits of x1 and x2 A B represent the integer portions of x1 and x2 a b represent the fractional portions of x1 and x2

The genetic operator phase of the GA will quasi-randomly mix bits from one parent to the other to form the offspring. Thus without constraint it is possible that the resulting offspring could contain 111 in AAA. This is outside the range of the x1 variable and results in an unusable offspring.

Illegal candidates can be handled in many ways. One possible implementation is to allow the genetic operator to form illegal results (simplifies the operator making it faster) then evaluate them for legality (creates variability in the time it takes to create a valid offspring, often a problem in real time situations like games).

01111111111111 equates to 6.283
01111111111110 equates to 6.282
01111111111101 equates to 6.281

This has the advantage of simplifying the genetic operator implementation, but retains the illegal value problem. However it is possible to adjust the dynamic range in the lookup up table automatically by adjusting the encoding such that a legal value occurs more than once, smoothing out the probability distribution to account for the over representation of the FLOOR or CEILING values.

Dynamic range also comes into play as the GA begins to converge on a minima or maxima, where the contents of the population become very similar making effective ranking and mating selection problematic. Methods for adjusting dynamic range are covered further in the Fitness Function section.

Alternatively in the fitness evaluation phase it is often easy to check the range of the variable and penalize the resulting value of the fitness result such that this offspring will not be selected for mating in the next generation. This represents a simple (and predictable) but computationally inefficient solution (random number of illegal offspring are contained in each new population).

This represents an architectural decision, for me these are the more interesting challenges of software design, approaching an art form. Usually the right choice is implementation dependent with a complex set of tradeoffs. All things being equal if you require faster convergence then culling in the operator phase might be more suitable. If you require predictable execution speed then culling in the fitness evaluation might be more suitable.

A direct floating point representation, where x1 and x2 are natively floating point values has a similar problem, but some hardware platforms allow efficient use of FLOOR and CEILING functions which clamp values inside a specified range.

This eliminates the creation of illegal values, but possibly creates a dynamic range problem. In these cases rather than the offspring being a constrained but statistically even distribution of the random values expressed by the parents encoding there is now a higher probability that a given set of offspring will contain values forced to the FLOOR (or CEILING) of the valid range.

Consider a four bit field where only the values of 0000 through 1000 are valid, this represents a 7 in 16 chance that a 4 bit random number generator will have its result FLOOR-ed to 1000. Overall, this means that 50% of the random number generator output values will be 1000 (7 in 16 values FLOOR-ed to 1000, plus the naturally occurring 1000 value), when ideally each value should have a 1 in 9 probability of occurring.

An alternative approach to either a fixed or floating point representation would be a lookup table function where specific binary (or Grey code values) represent specific real values:

Genetic Operators

Below I cover the classic GA operators. Genetic operator construction and implementation is a hot topic of GA research. There are many papers describing improvements to classic operators as well as some very interesting novel operators. Consider this a prime opportunity to use your creativity to improve the following:

Crossover

The crossover operator is applied to mating pairs based on probability. If the probability limit is reached then crossover occurs, otherwise the resultant offspring are simply copies of the parents. A commonly used crossover rate is about 0.75.

The mechanics of the classic single point crossover operator are simple. Using the chromosome structure from the integer example, with C[0:3],D[0:3] representing the two fields of Parent 1 and E[0:3],F[0:3] likewise for Parent 2, we have the following:

Parent 1:         C0  C1  C2  C3 D0  D1  D2  D3
Parent 2:         E0  E1  E2  E3 F0  F1  F2  F3
Crossover point --------------------^

The resulting offspring will look something like:

Offspring 1: c0 c1 c2 c3 d0 f1 f2 f3
Offspring 2: e0 e1 e2 e3 f0 d1 d2 d3

Offspring 1 shares the values of Parent 1 up to the crossover point and Parent 2 after the crossover point. Conversely Offspring 2 shares the opposite values of the parent pair.

There are many variations on offspring production. In what are termed constant population GA’s, two parents create two offspring. In other implementations the more fit of the two parents is permitted to continue in the next generation unmodified along with the offspring created by the mating of Parent 1. There is also interesting work being done on variable population GA’s which allow the population to grow and shrink based on some external value. One example would be to allow the population to grow or shrink based on the slope of the population’s average fitness for the previous N generations. Many other opportunities exist. Consult [5].

The classic dual point crossover operator adds another crossover point with the offspring either retaining the segment between the crossover points and exchanging the segments outside or visa versa. An example of exchanging within the crossover range

Parent 1:         C0  C1  C2  C3 D0  D1  D2  D3
Parent 2:         E0  E1  E2  E3 F0  F1  F2  F3
Crossover points ^--------------^
Offspring 1:      c0  c1  e2  e3 f0  f1  d2  d3
Offspring 2:      e0  e1  c2  c3 d0  d1  f2  f3 
There are many permutations to the single and dual point crossover implementations and offspring generation found in existing implementations. [3] and [4] explore the tradeoffs and underlying theory of preferred implementations.

Inversion

The inversion operator simply reverses the order of a range of values within the chromosome, again based on some usually fixed probability. This probability is classically applied for each chromosome rather than for each gene. The purpose is to introduce new gene combinations into the population, increasing diversity. Remember diversity is essentially a broadening of the populations search of the solution space.

Using the symbolic chromosome representation for clarity:

Offspring 1:     A  B  C  D  E  F  G
Inversion range       ^----------^
Result:          A  B  F  E  D  C  G

The inversion range points are also usually random values within the length of the chromosome. And like crossover inversion is applied on a probability basis. I commonly use an inversion rate of between .1 and .25 of the crossover rate depending on the problem I am trying to solve.

Mutation

The Mutation operator changes individual fields within a chromosome based on some, usually fixed, probability. Mutation is classically applied after all other genetic operators but before selection.

The purpose of mutation in biological processes is not understood analytically. Therefore it makes sense to also make the case that it is not well understood in GAs. The current general consensus is that while selection and crossover provide genetic mixing between parents to form new offspring, selection and crossover also have a tendency to “wipe out” valuable genetic information just as readily as they replace less valuable genetic information.

In another context imagine a GA nearing convergence on some value, as you might imagine genetic mixing of high quality candidates tends to form other high quality candidates. In essence what this means is that as a GA converges the individual elements of a population have a tendency to look similar. Genetic divergence is reduced, after all we are trying to converge on a solution. This is all well and good if the convergence point is some globally optimal minimum or maximum. However if the convergence value is a local maximum or minimum then the homogeneous nature of the population will not permit a diverse sampling of the solution space and the GA fails to find the optimal solution.

Typical implementations of mutation operators are just as varied as crossover operators but the original definition assumed a binary representation and used a very small probability that a given bit within a chromosome would be “flipped”. Each bit in a chromosome is walked and the probability function is evaluated. If the probability function is true for that bit, the new bit value is the inverse of the current value.

In symbolic chromosome representation, where each gene position represents a symbol, one possible implementation of mutation is to swap the positions of two adjacent genes.

Consider the traveling salesman problem where the task is to determine the shortest path between cities [A through G] organized on a 2 dimensional grid visiting each city only once. The fitness function would be the total path length. Each chromosome would then represent a possible path through those cities.

Offspring 1:            A  B  C  D  E  F  G
Mutation point  --------------^
Result:                 A  B  D  C  E  F  G

The mutation probability function is applied at each gene position in each chromosome of the population. In a swapping implementation care must be taken to adjust the probability of mutation downward to deal with the fact that each mutation candidate potentially effects two genes. Too much mutation causes problems in convergence while too little mutation causes premature convergence on local minimums or maximums.

A common starting point for mutation probability is 1 value out of 1000.

Selection

Selection is the process of deciding which chromosomes in the current population will pass their solution information to the next generation. There are a large variety of schemes for selection I will cover a few of the more commonly applied selection schemes.

Roulette Wheel

Classically roulette selection is calculated on the normalized fitness values of a population. Normalization is the summing of the fitness for the entire population and then dividing each member’s fitness by this sum.

Assume we have the following distribution of fitness in a population of 10 individual chromosomes. The sum of the fitness for the generation is 155. Each value in the Fitness column is divided by 155 to provide the Normalized Fitness. The Cumulative Norm column is the running sum of the Normalized Fitness values.

Chromosome Fitness Normalized Cumulative Probability
Fitness Norm of Selection

Chromosome Fitness Normalized Fitness Cumulative Norm Probability of Selection
0 3 0.019 0.019 0.19
1 12 0.077 0.096 0.77
2 17 0.109 0.205 1.09
3 7 0.045 0.250 0.45
4 9 0.058 0.308 0.58
5 11 0.070 0.378 0.70
6 13 0.083 0.461 0.83
7 42 0.270 0.731 2.70
8 23 0.148 0.879 1.48
9 18 0.116 1.00 (.995)* 1.21
* We allow a slight inaccuracy here due to truncation effects, the final value is .995 but set to 1.00. This will have little effect on the properties of the GA.

The probability of selection column is a measure of the number of available slots within a 10 slot wheel for each chromosome. I have added this column strictly to illustrate the properties of the roulette wheel and the proportionality of selection to fitness value. Also I have arbitrarily multiplied the difference between the cumulative norm values of adjacent chromosomes by 10. So out of 10 spins chromosome 7 will likely be selected 2.7 times. Notice that the smallest fitness value has the smallest probability. From this is should be clear that this example is trying to maximize the fitness value.

The next step is to produce 10 random numbers between 0 and 1 (10 being equal to our population size). If the random number is between 0 and 0.019, chromosome 0 is added to the list of selected chromosomes. If it is between 0.019 and 0.096 then chromosome 1 is selected.

The key here is that the probability of a particular chromosome being selected is proportional to it’s fitness value. More fit chromosomes will tend to be replicated in the selection list than less fit.

Tournament Selection

Tournament selection involves randomly choosing two candidates from the current population, comparing their fitness values and choosing the more fit for mating. A fixed population GA of size N will require 2N tournaments.

This can be combined with roulette wheel selection where the two candidates are chosen by roulette wheel probabilities and then proceed with the tournament. This is also known as Stochastic Tournament Selection. The effectiveness of this scheme is somewhat in question. The performance of a Stochastic Tournament Selection GA with it’s additional calls to the random function, should be evaluated against the performance of a straight Tournament Selection GA for your particular problem.

Thresholding

In this selection method a cut off is chosen, typically a fixed value. All chromosomes below the threshold do not survive to the next generation, i.e. they die. Their replacements are formed by the random mating of the remaining individual chromosomes. In theory if N replacements are required then N/2 mating pairs are selected through some random process, tournament, fully stochastic, etc. Each mating pair generates two offspring. Odd values of N are solved by producing only one offspring from one pair. Thresholding is generally similar to Elitism.

Fitness

After the process of selection and mating the entire population is checked for fitness. Each chromosome in the population is evaluated against the fitness function to determine it’s rank and probability for mating in the next generation.

It is at this stage the the iteration count and best fit individual chromosome are checked against the iteration limit and fitness limit to determine if the GA needs to proceed to the next generation.

As previously mentioned, the fitness function is usually defines the problem, and therefore may leave little room for optimization. If you are trying to solve a function, that function is your fitness equation. The previous example:

y = 100(x12– x22)2 + (1 – x1)2    for -2047 ≤ xi ≤ +2048

involves 2 subtractions, 1 add, and 5 multiplies. Understanding the capabilities of your hardware will determine how this function should be implemented.

For other problems you may have more opportunity for optimization. Using a contrived example consider a traveling salesman problem of dimension 5. I have 5 cities organized on a two dimensional plane. The coordinates of the cities are:

City   X     Y
A      1.5   1.5
B      0.0   0.0
C      2.2  16.1
D     11.0   0.1
E      3.4   3.2

The task is to find the shortest tour which can start from any city, passing through the remaining cities only once and returning to the starting city. (These are very difficult problems as the number of cities increases. If interested Google “traveling salesman genetic algorithm” and you will see the vast amount of research targeted at this NP-Hard problem.)

As I said this is a contrived example, but it illustrates an important point. If we define the problem to be one of finding the ordered list of cities which results in the shortest relative path length, rather than defining the problem to be “what’s the length of the shortest path” then we can make a simple optimization which will improve the overall performance of the GA engine.

Given this new definition we can simply multiplying the coordinates of each city by 10, this moves your internal representation of the fitness function from a floating point calculation to an integer calculation without effecting the result.

Another factor to consider with fitness functions is dynamic range. GA’s tend to converge asymptotically to a particular value, which may or may not be the global optimum. Asymptotically implies that the engine becomes less and less able to find new individuals which dramatically improve the current best fit solution. This slowing of improvement in turn implies that the relative difference between any two randomly selected individuals in the population will also decrease. The up shot of this is that it will become increasingly more difficult for your selection mechanism to choose the “best of the best”. This reduction in the differences between individuals is called a reduction in dynamic range.

Scaling is a technique which improves dynamic range. This can take several forms. The simplest is to simply raise the fitness value of each individual to some power.

A more complex form of scaling which keeps track of the performance history of the GA has been shown to be effective. Suppose you are trying to determine the minimum tour in the traveling salesman problem, then in this form of scaling your GA would track the worst solution found in the previous N generations and use these N values to derive a moving average of the worst N solutions. The current generation would then be evaluated against their improvement over the average worst solutions. Now we are measuring the improvement value of each chromosome, not their absolute fitness value. The improvement value is now the metric used to determine the ranking of an individual chromosome, and of course a high ranking increases the probability of selection for mating.

Definitions

Alleles
The set of valid values for each locus in a chromosome. A chromosome formed of binary bits, the allele for each bit position would be {0,1}. For a chromosome formed of single characters their would be 26 possible alleles for each position in the chromosome.

Chromosome
Typical implemented as a string of symbols, characters, bits, etc, but can also be sets of real numbers. A chromosome is considered a candidate solution for the fitness function. It may or may not meet the criteria for solution. In the research the set of all chromosomes in a generation is considered the genotype.

Locus
An individual position within a chromosome. A chromosome formed of 8 bit values would have 8 loci.

Gene
The combination of locus and the current value of the allele for that locus.

Generation
GA’s are iterative. Each generation is one iteration of the GA engine. A generation consists of N chromosomes.

Non-deterministic Turing machine
A Turing machine with more than one next state for the given current state. Informally a Turing machine with conditional branch capability is non-deterministic relative to it’s current memory contents.

Polynomial Time (also known as Big-O Notation)
O(g(n)) is a measure of the time required to execute an algorithm “f” based on the problem size “n”, i.e. f(n) = O(g(n)). Informally this means that the execution time of the function f(n) is less than some constant multiple (in this case “O”) of g(n) and therefore there must exist constants c and k that satisfy 0 <= f(n) <= c(g(n)) for all n >= k.

NP
Non-deterministic Turing machine accepted in Polynomial time

About the Author

Jeff Nye is a microprocessor designer and software developer for a large semiconductor company and based in Austin, Texas. He designs processors as well as writes CAD software. Many of his tools utilize Genetic Algorithms for optimization issues.

References

[1] National Institute of Standards and Technology, “Dictionary of Algorithms and Data Structures”, http://www.nist.gov/dads
[2] State University of New York, “Stony Brook Algorithm Repository”, http://www.cs.sunysb.edu/~algorit
[3] Holland, J. H. 1962, “Information processing in adaptive systems”,
[4] Kenedy J., Eberhart, E.,2001, Swarm Intelligence, Morgan Kaufman
[5] Annunziato, Mauro, et al, “Adaptive Parameterization of Evolutionary Algorithms and Chaotic Artificial Populations”, SOPIN s.p.a, http://erg87067.casaccia.enea.it/stefano/papers/soco01.pdf
[6] De Jong, 1975, “An analysis of the behavior of a class of genetic adaptive systems”, Doctoral Dissertation, Univ. Michigan




Discuss this article in the forums
Print article

© 2003-2004 DevMaster.net. All Rights Reserved. Terms of Use & Privacy Policy Want to write for us? Click here