# Selection

By now, we have learned the language of Genetic Programming. The language of the problem, ie how we describe a problem to GP, was Fitness. The language of the solution, ie the building blocks GP will use to solve the problem was Representation. When using GP, these two things are the only thing that need to be provided for a particular problem. GP operates completely problem-independent from here-on-out. Before we can examine the actual flow of the creative process, let us examine how individuals are selected.

A good procedure for selection is at the heart of any adaptive system. Since the changes we are making to the programs will be essentially random, if we want any progresss to occur, we need to be selecting only the best programs to be allowed to participate in genetic operations and computations.

Selection is the process by which we choose individuals to be processed in some way. Because we have limited processing power, and we would like for GP to solve problems as effectively and efficiently as possible, selection helps us narrow the search space to hopefully optimal bands. We will be using selection to choose which individuals participate in genetic operations in GP. We will discuss these genetic operations in the next section.

In the field of GP, there are two basic paradigms for selection, fitness-proportional selection and tournament selection. Let us look at either.

## Fitness-proportional Selection

In fitness proportional selection, all individuals have a chance of being selected at any given point. Furthermore, the probability that a given individual will be selected is equal to its normalized fitness (see Fitness). Recall that the sum of all normalized fitnesses in the population is 1, which is quite convenient for the sake of fitness-proportional selection.

The following code shows how fitness-proportional selection works:

public double selectIndividual(Individual[] population, double[] normalizedFitness) { double decision = Math.random(); double sum = 0; for(int i = 0; i < population.length; i++) { sum += normalizedFitness[i]; if(decision <= sum) return population[i]; } //The following line should never be reached return null; }

Because more-fit inidividuals have higher normalized fitnesses, they have a higher chance of being selected and their genetic material has a higher chance of being preserved. Also, because all individuals have some chance of being selected, there is less loss of genetic diversity than if we just selected the single best individual of every generation.

A good selection method both provides selective pressure for the fitness of the population as a whole to increase, while preserving genetic diversity. If a selection procedure is too loose (extreme example: choose an individual at random), there will be no forward progress. If it is too tight (extreme example: choose the single best individual), mediocre individuals will quickly overwhelm the population and prevent further progress.

A selection procedure that works well for one problem, might not work well at all for other problems. Fitness-poportional selection works well for many problems, but doesn't allow much flexibility in terms of selective pressure. Another popular selection procedure is called tournament selection.

## Tournament Selection

The selective pressure of tournament selection can be adjusted by means of the tournament size parameter, which makes it a more flexible selective procedure than fitness-proportional selection. Tournament selection works by selecting a number of individuals from the population at random, a tournament, and then selecting only the best of those individuals.

Tournament sizes tend to be small relative to the population size. The ratio of tournament size to population size can be used as a measure of selective pressure. Note that a tournament size of 1 would be equivalent to selecting individuals at random, and a tournament size equal to the population size would be equivalent to selecting the best individual at any given point.

Because we only care about whether one individual is better than another, to save processing, we only need to consider standardized fitness for this selection procedure. Recall that better individuals have lower standard fitnesses.

Let us see some sample code for tournament size:

public double selectIndividual(Individual[] population, double[] standardizedFitness, int tournamentSize) { Individual[] tournament = new Individual[tournamentSize]; double[] tournamentFitness = new double[tournamentSize]; for(int i = 0; i < tournamentSize; i++) { int index = (int) (Math.random() * population.length); tournament[i] = population[index]; tournamentFitness[i] = standardizedFitness[index]; } Individual bestIndividual = tournament[0]; double bestFitness = tournamentFitness[0]; for(int i = 1; i < tournamentSize; i++) if(tournamentFitness[i] < bestFitness) { bestIndividual = tournament[i]; bestFitness = tournamentFitness[i]; } return bestIndividual; }

Tournament selection works by creating a tight selective pressure on a small local group of individuals. Neither of these two selection procedures is better than the other for every problem. There are also a whole slew of other selection procedures that may or may not be based on either of these. You could imagine a hybrid procedure that selects a tournament of individuals and then performs fitness-proportional selection within that tournament. The only way to know whether a selection procedure works best for your application is to test it against other procedures quantitatively. You can try to come up with qualitative reasons why one works better than another all you want, but at the end of the day it is the results that matter.

Now that you know how we will be selecting individuals, let's take a look at the Genetic Operations.