An alternative introduction to Genetic and Evolutionary Algorithms

Paulo Zemek

4.94/5 (32 votes)

Jun 6, 2014

CPOL

23 min read

62795

1093

This article presents a very small evolutionary algorithm capable of discovering some math expressions to give the results for the values you provide.

Introduction and Background

When I was at university I had classes about Artificial Intelligence and Genetic Algorithms. One doesn't really depend on the other, yet it is common to relate them.

One of the things that teachers loved to say was that logical persons had more difficulty to understand those algorithms than non-logical persons as, as they said, those algorithms aren't based on logic.

Well, I disagreed at the time and I continue to disagree. Genetic Algorithms are logical algorithms but instead of using a previously known formula to solve a problem or trying every possibility, they start with some "random possibilities to try" (usually of some kind of math formula) that may or may not have a solution to the problem and, if none of them have, the algorithm develops new populations of possibilities based on the previous ones using random (yet controlled) changes on them, which can include mutations, permutations, crossovers or some other techniques.

The random development of new populations is where there's "no logic". In fact, the random is used simply because we, the humans who develop the application, don't know a better way to directly change the populations towards the good result. If we knew that, we could simply write that algorithm, which will be better but will not be a Genetic Algorithm anymore.

Yet, even with the random changes, there's an analysis to decide which solutions are the "best candidates" to achieve the right result. This is still a normal programming algorithm and its quality will directly interfere the results of the Genetic Algorithm (in terms of performance, the probability of finding solutions that work and the probability of finding the fastest solution among those that work).

Actual versus old documents and examples

When I studied Genetic Algorithms teachers said that those algorithms had cells. Cells reproduced themselves, cells could talk to other cells etc.

Now, when I look for Genetic Algorithms on the internet I only find documents talking about chromosomes and genes. Maybe my teachers used the wrong sources as it seems natural for Genetic Algorithms to have chromosomes and genes. Yet, the idea of cells still allows for Evolutionary Programming, which is a broader domain that includes Genetic Algorithms and, in some cases, it is only a matter of names.

Returning to the Genetic Algorithms, they are explained as working over a set of chromosomes "usually represented as a string", yet I didn't find any example that's an exception, so it seems that all genetic algorithms use strings and the entire algorithm is explained by showing how those strings are modified.

To see something extremely simple, some examples are like this:

Our target chromosome must be like this:

01010101

Then, we start our code with chromosomes like:

00000000

and

11111111

And during the reproduction phase they may have crossovers(in this case, a single crossover at any place):

00001111
11110000
00111100

And also some mutation (in this sample, exactly one mutation per reproduction):

01001111
00001110
00101100

Then, the next step is to chose the best generated values to see if a result was found or if new reproductions are needed.

The Problem

Another thing that teachers loved to say was that Genetic Algorithms are extremely hard and, again, I didn't agree. Surely one needs to understand the programming language and it may be hard to follow an algorithm that has "random" factors and executes millions (or even more) operations per second. Yet, what makes those algorithms hard, in my opinion, are the bad examples that make everything look more complex than needed.

To start, I don't like the fact that those algorithms are explained as depending on strings, as those strings need to be parsed later. But what I really hate, as it makes the idea unusable, is the fact that we must give the answer as input. So, why write code to try to find the answer that we already know?

I am not talking about writing code that will find an answer that the user knows while the computer doesn't. That's called testing. I am talking about a code that must always receive the right answer as its input. It is useless independently if it will always find the right answer, as to do it we can simply avoid any complex logic and return the input value directly.

Genetic Algorithms are meant to find answers to those problems that we don't know the final answer and don't know a better algorithm to find it. We may know some questions and some answers, but we expect that it "learns enough" to be able to give the right answers even to those questions that we, the users, don't have the answers yet.

A Better Example

To me, the math problems are a better example. There are many exercices that give us some input values and their outputs and then ask us to give the output for another value that's not on the samples. For example:

The input 1 gives the value 3;
The input 2 gives the value 5;
The input 3 gives the value 7;
The input 10 gives the value 21.

What's the result for the input 7?

As humans, we can easily answer this by either summing up 1 in the input and 2 in the output until we arrive to the right value or we can do better and develop a formula, like this:

F(x) = (x*2) + 1

So, with this formula, F(7) = 15, all the input and outputs can be tested and, better yet, by storing the formula we can generate the results for any other input.

Why is this a better example?

We will only give some inputs and outputs to the Genetic Algorithm and it must discover a formula that generates those results. We don't expect it to simply return those values as the right answer. If the algorithm discovers the right formula, it will be capable of giving answers to any input, not only those used to develop the formula. In a sense, it will "learn" how to generate results from the inputs.

But, to create such an algorithm, I will not use "chromosomes". To me that's not the right idea. I will use cells or, as I decided to call them, Neurons. So, the algorithm I am going to present in this article may not be considered as a Genetic Algorithm because of the terminology and because I am not manipulating strings directly... but you know, it is only a matter of preference as I could manipulate strings and write parsers for them (that will only slow things down, but it's possible).

Note: Aside from the name Neurons, this article and the code that accompanies it have nothing to do with neural-networks.

The Kinds of Neurons

In my sample I will use two kind of Neurons: Simple Neurons and Binary Neurons.

Simple Neurons will either return the input value directly or will return a constant value, effectively ignoring the input. It would be possible to create different kinds of Simple Neurons but I decided to stop there.

Binary Neurons will apply some math operations (like addition, subtraction, etc) over two other neurons (so, they communicate to other neurons and generate a new answer based on both answers received from that communication).

The Binary Neurons may communicate to any other neuron, including other Binary Neurons. This is how the algorithm can create really big math expressions.

Then, the purpose of the Genetic Algorithm (or Evolutionary Algorithm, if you prefer) is to create some first generation "beings" (we can call them brains too, as they are composed only of neurons), check if any of them generates a valid result and, if not, "reproduce" the best ones (I will not discuss if neurons reproduce or not, but think about the entire "brain" being clonned allowing some changes). Of course, after the reproduction, the verification if a valid result was found will be reexecuted and new reproductions may happen... this can be repeated "forever" if needed.

Each neuron individually can be seen as part of a math expression. The reproduction of the neurons allows the parts of the expressions to mutate. Yet, there's no direct string manipulation so, even if a formula may not generate the right values, there's no chance of writing an incomplete math expression. If we were using direct string manipulations over human readable formulas, we could very easily create expressions that open more parenthesis than they close.

We will evaluate how near or far we are from the right result by getting the differences from the results we will receive from the expression compared to the expected results given as input.

Programming The Neurons

To create the algorithm, I started with the INeuron interface. It is composed of the following members:

int? Execute(int input);
INeuron Reproduce();
int Complexity { get; }

The Execute() method receives an input and must generate a result. I made it return a nullable int because there are some expressions that may fail (for example, dividing a value by zero) and I want a result that represents a failure without generating exceptions;
The Reproduce() method is expected to, well, reproduce a Neuron. It is the neuron itself that decides how to reproduce. For example, the Binary Neurons usually reproduce each one of the neurons to which they communicate to then create a neuron that communicates with those new neurons;
The Complexity will be used when chosing the best results. If two or more expressions give the same results, we will chose the less complex ones as the best ones (the smaller formulas).

Then, I created the InputNeuron, the IntNeuron and the BinaryNeuron.

The InputNeuron simply returns the input value as its result. It doesn't really reproduce as it never changes, so its Reproduce() returns itself. To the computer, it is not important that it doesn't really reproduce as the "cell" is not going to age. In a formula like x + 1, this cell represents the x;
The IntNeuron returns a constant value. During its reproduction it may create a child with a constant value that's a little bigger or smaller than its own's. In the formula x + 1, the 1 is achieved by an IntNeuron that holds the value 1;
The BinaryNeuron holds the math operations like addition, subtraction etc and does that operation over the results of two other neurons. So, in the x + 1 formula, the + is a BinaryNeuron that uses the addition operation over the InputNeuron and an IntNeuron. During its reproduction it may ask the other two neurons to reproduce (the most common case) or may decide to increase the number of cells (effectively putting a BinaryNeuron as one of its sub-neurons) or even decide to return the child of only one of its sub-neurons as its own child (effectively reducing the size of the expression).

In theory, it is better if a Simple Neuron can reproduce by becoming a BinaryNeuron, but I simply considered it easier to let the simple neurons only caring about themselves and making the number of cells increase or decrease based on the binary neuron. This means we must have binary neurons on the first generation or else our expressions will never be able to increase in size during a reproduction.

So, to see some code, this is how those classes look:

public interface INeuron
{
  int Complexity { get; }

  int? Execute(int input);

  INeuron Reproduce();
}

public sealed class InputNeuron:
  INeuron
{
  public static readonly InputNeuron Instance = new InputNeuron();

  public int Complexity
  {
    get
    {
      return 1;
    }
  }

  public int? Execute(int input)
  {
    return input;
  }

  public INeuron Reproduce()
  {
    return this;
  }

  public override string ToString()
  {
    return "x";
  }
}

public sealed class IntNeuron:
  INeuron
{
  private int _value;
  public IntNeuron(int value)
  {
    _value = value;
  }

  public int Complexity
  {
    get
    {
      return 1;
    }
  }

  public int? Execute(int input)
  {
    return _value;
  }

  public INeuron Reproduce()
  {
    int random = Program.GetRandom(11);

    if (random >= 2)
    {
      // we don't really need to create a new object if we don't want to make any changes, so
      // we simply return this.
      return this;
    }

    int amountToChange = 1;
    int percentualChange = Program.GetRandom(11);
    if (percentualChange > 0)
    {
      amountToChange = _value * percentualChange / 100;

      if (amountToChange == 0)
        amountToChange = 1;
    }

    if (random == 0)
      return new IntNeuron(_value - amountToChange);

    return new IntNeuron(_value + amountToChange);
  }

  public override string ToString()
  {
    return _value.ToString();
  }
}

public sealed class BinaryNeuron:
  INeuron
{
  public static BinaryNeuron Add(INeuron neuron1, INeuron neuron2)
  {
    return new BinaryNeuron((a, b) => a + b, " + ", neuron1, neuron2);
  }
  public static BinaryNeuron Subtract(INeuron neuron1, INeuron neuron2)
  {
    return new BinaryNeuron((a, b) => a - b, " - ", neuron1, neuron2);
  }
  public static BinaryNeuron Multiply(INeuron neuron1, INeuron neuron2)
  {
    return new BinaryNeuron((a, b) => a * b, " * ", neuron1, neuron2);
  }

  private readonly Func<int, int, int?> _function;
  private readonly string _operatorName;
  private readonly INeuron _neuron1;
  private readonly INeuron _neuron2;
  public BinaryNeuron(Func<int, int, int?> function, string operatorName, INeuron neuron1, INeuron neuron2)
  {
    _function = function;
    _operatorName = operatorName;
    _neuron1 = neuron1;
    _neuron2 = neuron2;
    Complexity = neuron1.Complexity + neuron2.Complexity;
  }

  public int Complexity { get; private set; }

  public int? Execute(int input)
  {
    var value1 = _neuron1.Execute(input);
    if (!value1.HasValue)
      return null;

    var value2 = _neuron2.Execute(input);
    if (!value2.HasValue)
      return null;

    try
    {
      return _function(value1.Value, value2.Value);
    }
    catch
    {
      return null;
    }
  }

  public INeuron Reproduce()
  {
    var neuron1 = _neuron1.Reproduce();
    var neuron2 = _neuron2.Reproduce();

    if (Program.GetRandom(100) != 0)
    {
      if (neuron1 == _neuron1 && neuron2 == _neuron2)
        return this;

      return new BinaryNeuron(_function, _operatorName, neuron1, neuron2);
    }

    return Program._ReproduceNeuron_Random(this, neuron1, neuron2);
  }

  public override string ToString()
  {
    return "(" + _neuron1.ToString() + _operatorName + _neuron2.ToString() + ")";
  }
}

Note: The BinaryNeuron is bigger in the sample as it deals with more expressions. At this moment, I only want to present the idea, not the full class.

The main algorithm

The main algorithm can be reduced as:

Generate base population of "beings" (or brains, as they are composed only of neurons);
Do a loop that:
1. Orders the beings by how good they are compared to the expected results;
2. Check if the algorithm found a good result;
3. Maybe "kill" and replace complex or repetitive beings by new first generation beings;
4. Reproduce the best beings.

So, for the main algorithm to work, we must have some input values with the expected results. In the the application I ask the number of input/ouput pairs that will be given then I ask for those pairs. How those inputs are really given is not important to the algorithm, but this is how I did it in the sample.

The algorithm must start with some "first generation" neurons. In my implementation those neurons are randomly generated using:

Some IntNeurons that may have the values 1, 2, 3 or 10, as most expressions use small values like this;
Any of the user provided values as an IntNeuron;
The InputNeuron;
BinaryNeurons of any of the possible operations over other randomly generated neurons.

It is important to note that how the first generation of Neurons is generated may greatly improve or reduce the quality of the algorithm. I didn't made a big study on that (and for real applications that may be necessary), I simply decided to use values that looked natural to me. In most formulas that I know there's a direct relation to at least one of the input or output values, which may be multiplied, subtracted or even raised to a power value related to any of the other input/output values or to the values 1, 2, 3 or 10. So, I decided to use those as the values that may appear in the first generation.

So, even if the first population is generated randomly, it is not simply built of "random" values.

The Sample

At this point, I strongly recommend that you download the sample and try it with values like:

Input	Output
1	8
2	64
3	216

Input	Output
1	2
10	11
57	58

Or more complex values, like:

Input	Output
1	1
2	-4
3	3
4	-8
11	11
12	-24

Input	Output
1	1
2	0
3	3
4	0
5	5
6	0

It is important to note that the sample uses integer variables only. So, 1 / 2 ends-up being zero. If it is multiplied by 2 (or any other value) later it will keep the zero result.

So, take this into accout if you find formulas that divide a value then multiply it again by the same value, as the final value may be zero instead of the original value.

The Problems

I hope that you have already tested the sample application. Maybe you saw that sometimes it is really fast to give a result, sometimes it is slow or it simply never finds a result and that for the same input/expected results it may generate different formulas.

This is all related to the "problems" faced by genetic algorithms. So I will try to explain some of them.

Diversity

It is important to have a good amount of "beings" to chose from. I opted to use 5700 beings at a time. Using only some beings at a time may be bad because we may go only in "one direction" when trying to find the results. Using much more beings would be the best solution if they could work really in parallel as in the real life but, in computers, they consume CPU time. So we must try to use a value that's big enough to allow diversity without being a performance problem.

I try to keep diversity by reproducing half the "best beings" (each one with 2 childs, this is how they fill all the 5700 available beings). I could use different formulas like allowing the best being of all to have more childs than the others. I am pretty sure many optimizations can be made here.

I don't go to the point of forcing diversity. That would be possible by keeping "groups" of neurons. For example, creating groups of non-complex neurons and groups of complex neurons... or groups of neurons that only use base math operations and groups that use other operations. In fact, the complexity of the expressions is only calculated using the amount of neurons involved, but in a better algorithm we could classify easier to solve operations (like addition) as being simpler than expressions that involve division or power operations.

Selecting the best results

The diversity of the beings and "going into one direction" is actually part of another problem. How do we select the best results?

The sample deals with math formulas, which generate integer results. So...

Should we use only the difference between the expected values and the results given by the beings?
Should we consider the amount of right values versus the amount of wrong values, independently on the amount of difference?
Should we use a combination of those? Or another kind of analysis?

To explain the problem, consider the input values 1, 2 and 3 and the results 11, 12 and 13.

If we consider only how many results got the right values, an expression like F(x) = x + 9 will be worse than a expression like F(x) = x * 6. This is because x * 6 will at least get the input 2 -> result 12 right while the other expression will miss all the values.

If we consider only the sum the differences from the right values, x + 9 will be better than x * 6. After all, even if it doesn't give any right values now, it is off by 1 for every input, having a total difference of 3. The other formula has a total difference of 10 compared to the expected result values.

But the formula F(x)=12 (yes, a constant) will be considered the best if compared to the other 2 solutions. It will get one value right and will only have a total difference of 2 for the other values. But we know that a simple constant will never be able to give the right values for all the inputs. A real formula (the addition one) can, and it is really near the right result (change the 9 by a 10, and it finds the right answer).

Now imagine that we were only using the "best" formula to reproduce (in this case, the constant 12). It can be reproduced as 11 and as 13 (or even some other value). Yet the 12 will be the best result. Considering my implementation, a constant variable will never reproduce as two cells, so we will be stuck with a no-solution, only because the discovery of the "best" solution used the wrong analysis to get the best result.

This actually means that deciding which being gives the best result is pretty important, as a bad decision may direct the algorithm to sub-optimal results. This is why it is said that most genetic algorithms require a domain-specific way of selecting the best candidates (or domain specific Fitness functions). Yet, genetic algorithms are used for more complex situations, where we don't really know the answer, so we try to use a "fair" selection and, to solve those situations where we are not really chosing the best ones, we keep a good diversity so some of the not so good alternatives (by the function that does the analysis) may evolve as the best one.

The sample algorithm

In my algorithm, I consider that an expression that gives a single good result is not better than one that misses all of them (this is to avoid considering constants that give one of the result values as better than a real formula). Yet, to try to go into the right direction, the total difference from the expected results plus the complexity of the expression is used. The smaller the value, the "best" it is.

When an expression generates two or more good results, it is considered better than one that generates less good results, independently on the total difference of the results. But, comparing to another expression that gives the same amount of good results, we still consider the total difference from the expected results plus the complexity. In the end, we benefit expressions that have more good results first, then those that are simpler and that have smaller differences to the expected results.

Of course this may still be problematic. We can have a formula that has all the right operators and only missed the constant values that's killed by a formula that has completely wrong operators but is giving "better values" right now. So, this is an area really important for professional uses.

Mutation Ratio

The mutation ratio is another trait that's very important. If it is too small, we may run several generations with the "same" expressions simply because the population is not changing, and we know that doing this will be a performance problem.

If the mutation factor is too big, we may be near the result in one generation and at the next run we go too far, changing much more items in the expression than needed.

In my opinion, it would be better if we could guarantee between 1 and 2 mutations per being reproduction, never allowing no mutations and never allowing too many mutations. But the algorithm of the sample is simply random. It may do more changes per reproduction, it may not do any change after the reproduction. I tried to configure the mutation ratios to work for me (usually very small formulas) but having the capacity to adjust the mutation ratios (maybe storing them as a trait of the cells/beings themselves) could help generate better results (or simply find them faster).

Crossover?

My algorithm doesn't deal with crossovers at all. I see possible crossovers working like this:

Simple cells don't crossover. If a simple cell is involved, then either the simple cell is returned or the "other" cell, independently if the other cell is simple or not;
Considering that we only have two kinds of cells, when a binary cell tries to crossover with another binary cell, it is possible that either one of its sub-expressions is replaced by either-one of the sub-expressions of the other cell, but the crossover doesn't need to happen for every cell (the same way a mutation doesn't happen at every cell when reproducing), so a binary cell may not crossover directly, yet it may ask to one of its sub-cells to crossover with one of the sub-cells of the other binary cell. This means that in big expressions, we may have big crossovers (like taking half of the other expression directly) or only small crossovers.

As I already said. I didn't do it in this sample... if you want, feel free to try it.

Sub-optimal solutions

One of the possible problems of genetic algorithms is the generation of sub-optimal solutions. As already explained, the diversity of the generated populations is there as a way to try to avoid bad beings from killing the good ones simply because "they look better" at a given time.

But actually there are two problems related to sub-optimal solutions. One of them is never finding a solution simply because we are trying to evolve the wrong beings (or maybe because there isn't a perfect solution). Another problem is finding a solutions that works, but that does much more work than needed.

In the sample application, it will never give you a result if it doesn't find all the right answers. In the real world it is possible that we have input data coming from "90% correct" sources. So, we may want to get an answer even if it is not 100% right.

Aside from that, by executing the sample application you will see that if you give inputs like: 1, 2, 3, that must generate values like 11, 12, 13, it is possible that you get a result like (9 + x) + 1. This is the other kind of sub-optimal solution, as the expression is bigger than it needs to be.

Actually, the sample stops on the first expression that gives a valid result for all the input values, independently of its complexity. For real situations, it could be possible to continue looking for better solutions even after finding one that works 100%. We do differently and continue to find solutions that give the same results but that are somewhat simpler (faster to execute, that have less operations etc).

And we must not forgot that Genetic Algorithms may be combined with other solutions. For example, transforming an expression like (9 + x) + 1 into an expression like x + 10 may be easily done by an algorithm specialized in simplifying expressions. Such algorithm may be completely incapable of discovering the expressions for some set of inputs and outputs but may do great jobs in reducing already generated expressions by eliminating "no-ops" (like x * 0), by replacing operations that evaluate into constants (like (x + x) / x by 2) or even by finding faster operations (x power 3 could be faster if written as x * x * x).

Giving up

The sample tries to find a solution during 5700 iterations. If it doesn't find one, it resets completely. That is, it creates an entire new initial population and starts again. You, as the user, are free to stop it at anytime but the algorithm itself will never stop if it doesn't find a solution, it will only do those resets.

For more professional solutions, the algorithms themselves may be programmed to give up if they don't find a solution for some time or if they simply don't see any evolution after a certain number of iterations. And, depending on the situation, they will still give a result saying it is not 100% but "acceptable".

Parallelism

Many documents say that genetic/evolutionary algorithms are great candidates for parallelism but the sample I am giving doesn't use any parallelism.

I actually tried to keep the code simple. It is easy to make it parallel by replacing most fors by the Parallel.For() method. The problem is that the Random class is not thread-safe, so we need to do locks when calling the Random.Next() method. Well, locks actually reduce the real parallelism of the code so, if we want to run things really in parallel, the best would be to ignore the Parallel.For() method and allocate one thread per CPU, each one already holding its own Random instance and each one knowing over which part of the array they should work on. It would be important to initialize every Random instance with a different seed, or else two or more CPUs may end-up generating exactly the same values, all the time.

So, even if the algorithms themselves are prepared to be parallel, as one being doesn't depend on other beings, the fact that the Random class is not thread-safe means that we should find a solution, which in my opinion should not be part of this article, as there are many different ways to solve the problem and I think that exploring them has enough content to be a full article on itself.

Conclusion

Genetic and evolutionary algorithms aren't hard. They are simple algorithms that, by lacking better alternatives, utilize random generated values as part of their process. What really makes them hard are bad examples and the fact that they are used to solve very complex problems and together with other complex algorithms. But, when well isolated, they are simple algorithms that keep reproducing good solutions with some random changes to try to make one even better.

In the end, evolutionary algorithms help us achieve incredible results without requiring us to fully understand the problems that are being solved.