Download demo project (includes source code) - 11.5 Kb

Introduction

This article describes how to solve a logic problem using a Genetic Algorithm. It assumes no prior knowledge of GAs. In fact, half of this article is dedicated to explaining the internal structure of a Genetic Algorithm.

So what is the problem domain we are trying to solve?

Well, GAs can be used to solve many problems. In fact, GAs have been used to grow new mathematical syntax trees, train multi-layer neural networks, to name but a few instances.

However, for this example, I have used a simple card splitting excercise, which is as detailed here:

You have 10 cards numbered 1 to 10
You have to divide them into two piles so that:
- The sum of the first pile is as close as possible to 36.
- And the product of all in the second pile is as close as possible to 360.

Now, I am not saying that this could not be done by hand, using old fashioned brain juice, it's just better suited to a GA, as it could take 100s or even 1000s of different combinations to get the correct result. Well, probably not that many for this simple problem, but it certainly could take a lot of combinations for a more difficult problem. Suffice to say, it is just good fun to do it with a GA. So, let's carry on.

So what is a Genetic Algorithm?

Well, Wikipedia says this:

A genetic algorithm is a search technique used in computing, to find true or approximate solutions to optimization and search problems, and is often abbreviated as GA. Genetic algorithms are categorized as global search heuristics. Genetic algorithms are a particular class of evolutionary algorithms that use techniques inspired by evolutionary biology such as inheritance, mutation, selection, and crossover (also called recombination).

Genetic algorithms are implemented as a computer simulation in which a population of abstract representations (called chromosomes or the genotype or the genome) of candidate solutions (called individuals, creatures, or phenotypes) to an optimization problem evolves towards better solutions. Traditionally, solutions are represented in binary as strings of 0s and 1s, but other encodings are also possible. The evolution usually starts from a population of randomly generated individuals, and happens in generations. In each generation, the fitness of every individual in the population is evaluated, multiple individuals are stochastically selected from the current population (based on their fitness), and modified (recombined and possibly mutated) to form a new population. The new population is then used in the next iteration of the algorithm.

Follow that?? If not, let's try a diagram. (Note that this is a Microbial GA, there are lots of GA types, but I just happen to like this one, and it's the one this article uses.)

I prefer to think of a GA as a way of really quickly (well, may be quite slow, depending on the problem) trying out some evolutionary programming techniques, that mother nature has always had.

So how does this translate into an algorithm (this article uses a Microbial GA, but there are many other varieties)?

The basic operation of the Microbial GA training is as follows:

Pick two genotypes at random
Compare Scores (Fitness) to come up with a Winner and Loser
Go along genotype, at each locus (Point)

That is:

With some probability (randomness), copy from Winner to Loser (overwrite)
With some probability (randomness), mutate that locus of the Loser
So only the Loser gets changed, which gives a version of Elitism for free, this ensures that the best in the breed remains in the population.

That's it. That is the complete algorithm.

But there are some essential issues to be aware of, when playing with GAs:

The genotype will be different for a different problem domain
The Fitness function will be different for a different problem domain

These two items must be developed again, whenever a new problem is specified.

For example, if we wanted to find a person's favourite pizza toppings, the genotype and fitness would be different from that which is used for this article's problem domain.

These two essential elements of a GA (for this article's problem domain) are specified below.

1. The Geneotype

//the genes array, 30 members, 10 cards each
private int[,] gene = new int[30, 10];

Well, for this article, the problem domain states that we have 10 cards. So, I created a two dimensional genes array, which is a 30*10 array. The 30 represents a population size of 30. I picked this. It could be any size, but should be big enough to allow some dominant genes to form.

2. The Fitness Function

Remembering that the problem domain description stated the following:

You have 10 cards numbered 1 to 10
You have to divide them into two piles so that:
- The sum of the first pile is as close as possible to 36.
- And the product of all in the second pile is as close as possible to 360.

Well, all that is being done is the following :

Loop through the population member's genes
If the current gene being looked at has a value of 0, the gene is for the sum pile (pile 0), so add to the running calculation
If the current gene being looked at has a value of 1, the gene is for the product pile (pile 1), so add to the running calculation
Calculate the overall error for this population member. If this member's geneotype has an overall error of 0.0, then the problem domain has been solved

//evaluate the the nth member of the population
//@param n : the nth member of the population
//@return : the score for this member of the population.
//If score is 0.0, then we have a good GA which has solved
//the problem domain
private double evaluate(int n)
{
    //initialise field values
    int sum = 0, prod = 1;
    double scaled_sum_error, scaled_prod_error, combined_error;
    //loop though all genes for this population member
    for (int i = 0; i < LEN; i++)
    {
        //if the gene value is 0, then put it in the sum (pile 0), 
        //and calculate sum
        if (gene[n,i] == 0)
        {
            sum += (1 + i);
        }
        //if the gene value is 1, then put it in the product (pile 1), 
        //and calculate sum
        else
        {
            prod *= (1 + i);
        }
    }
    //work out how food this population member is, based on an overall error
    //for the problem domain
    //NOTE : The fitness function will change for every problem domain.
    scaled_sum_error = (sum - SUMTARG) / SUMTARG;
    scaled_prod_error = (prod - PRODTARG) / PRODTARG;
    combined_error = Math.Abs(scaled_sum_error) + Math.Abs(scaled_prod_error);

    return combined_error;
}

Using the code

The demo project attached actually contains a Visual Studio 2005 solution, with the following two classes.

Program class

Is the main entry point into the Simple_GeneticAlgorithm application. All this class does is create a new Simple_GeneticAlgorithm object and call its run() method.

using System;
using System.Collections.Generic;
using System.Text;

namespace Simple_GeneticAlgorithm
{
    class Program
    {
        //main access point
        static void Main(string[] args)
        {
            //create a new Microbial GA
            Simple_GeneticAlgorithm GA = new Simple_GeneticAlgorithm();
            GA.run();
            //read a line, to stop the Console window closing
            Console.ReadLine();
        }
    }
}

Simple_GeneticAlgorithm class

Runs the GA to solve the problem domain.

using System;
using System.Collections.Generic;
using System.Text;

namespace Simple_GeneticAlgorithm
{
    public class Simple_GeneticAlgorithm
    {
        //population size
        private int POP = 30;
        //geneotype
        private int LEN = 10;
        //mutation rate, change it have a play
        private double MUT = 0.1;
        //recomination rate
        private double REC = 0.5;
        //how many tournaments should be played
        private double END = 1000;
        //the sum pile, end result for the SUM pile
        //card1 + card2 + card3 + card4 + card5, MUST = 36 for a good GA
        private double SUMTARG = 36;
        //the product pile, end result for the PRODUCT pile
        //card1 * card2 * card3 * card4 * card5, MUST = 360 for a good GA
        private double PRODTARG = 360;
        //the genes array, 30 members, 10 cards each
        private int[,] gene = new int[30, 10];
        //used to create randomness (Simulates selection process in nature)
        //randomly selects genes
        Random rnd = new Random();

        //empty constructor
        public Simple_GeneticAlgorithm()
        {
        }

        //Runs the Microbial GA to solve the problem domain
        //Where the problem domain is specified as follows
        //
        //You have 10 cards numbered 1 to 10.
        //You have to divide them into 2 piles so that:
        //
        //The sum of the first pile is as close as possible to 36
        //And the product of all in second pile is as close as poss to 360
        public void run()
        {
            //declare pop member a,b, winner and loser
            int a, b, Winner, Loser;
            //initialise the population (randomly)
            init_pop();
            //start a tournament
            for (int tournamentNo = 0; tournamentNo < END; tournamentNo++)
            {
                //pull 2 population members at random
                a = (int)(POP * rnd.NextDouble());
                b = (int)(POP * rnd.NextDouble());
                //have a fight, see who has best genes
                if (evaluate(a) < evaluate(b))
                {
                    Winner = a;
                    Loser = b;
                }
                else
                {
                    Winner = b;
                    Loser = a;
                }
                //Possibly do some gene jiggling, on all genes of loser
                //again depends on randomness (simulating the 
                //natural selection
                //process of evolutionary selection)
                for (int i = 0; i < LEN; i++)
                {
                    //maybe do some recombination
                    if (rnd.NextDouble() < REC)
                        gene[Loser, i] = gene[Winner, i];
                    //maybe do some muttion
                    if (rnd.NextDouble() < MUT)
                        gene[Loser, i] = 1 - gene[Loser, i];
                    //then test to see if the new population member 
                    //is a winner
                    if (evaluate(Loser) == 0.0)
                        display(tournamentNo, Loser);
                }
            }
        }


        //Display the results. Only called for good GA which has solved
        //the problem domain
        //@param tournaments : the current tournament loop number
        //@param n : the nth member of the population. 
        private void display(int tournaments, int n)
        {
            Console.WriteLine("\r\n==============================\r\n");
            Console.WriteLine("After " + tournaments + 
                              " tournaments, Solution sum pile " + 
                              "(should be 36) cards are : ");
            for (int i = 0; i < LEN; i++) {
              if (gene[n,i] == 0) {
                Console.WriteLine(i + 1);
              }
            }
            Console.WriteLine("\r\nAnd Product pile " + 
                              "(should be 360)  cards are : ");
            for (int i = 0; i < LEN; i++) {
              if (gene[n,i] == 1) {
                  Console.WriteLine(i + 1);
              }
            }
        }

        //evaluate the the nth member of the population
        //@param n : the nth member of the population
        //@return : the score for this member of the population.
        //If score is 0.0, then we have a good GA which has solved
        //the problem domain
        private double evaluate(int n)
        {
            //initialise field values
            int sum = 0, prod = 1;
            double scaled_sum_error, scaled_prod_error, combined_error;
            //loop though all genes for this population member
            for (int i = 0; i < LEN; i++)
            {
                //if the gene value is 0, then put it in 
                //the sum (pile 0), and calculate sum
                if (gene[n,i] == 0)
                {
                    sum += (1 + i);
                }
                //if the gene value is 1, then put it in 
                //the product (pile 1), and calculate sum
                else
                {
                    prod *= (1 + i);
                }
            }
            //work out how food this population member is, 
            //based on an overall error
            //for the problem domain
            //NOTE : The fitness function will change 
            //       for every problem domain.
            scaled_sum_error = (sum - SUMTARG) / SUMTARG;
            scaled_prod_error = (prod - PRODTARG) / PRODTARG;
            combined_error = Math.Abs(scaled_sum_error) + 
                             Math.Abs(scaled_prod_error);

            return combined_error;
        }

        //initialise population
        private void init_pop()
        {
            //for entire population
            for (int i = 0; i < POP; i++)
            {
                //for all genes
                for (int j = 0; j < LEN; j++)
                {
                    //randomly create gene values
                    if (rnd.NextDouble() < 0.5)
                    {
                        gene[i,j] = 0;
                    }
                    else
                    {
                        gene[i,j] = 1;
                    }
                }
            }
        }
    }
}

The results

Taking the last good population member results found, let's test it out.

2 + 7 + 8 + 9 + 10 = 36 in Pile 0, this is all good 
1 * 3 * 4 * 5 * 6 = 360 in Pile 1, this is all good

Points of Interest

I hope this article has demonstrated how to write a simple GA to solve a problem that we as humans would probably find hard to do manually. Remember this is a simple problem. what would happen if we upped the problem domain? A GA really is the way to go.

I will very shortly publish an article on a GA training a multi layer neural network to solve some logic problems. So if your'e into this sort of stuff, watch this space.

History

v1.0 - 08/11/06.