In a sense, the fitness measure is the way we define the problem to GP. The computer can not understand abstract instructions of what needs to be done, but it can analyze a given solution to a problem given a set of predefined rules. GP will then attempt to evolve individuals which have the best fitness values possible. In this sense, GP is an optimization technique, searching through the space of all possible computer programs for one that has optimal fitness.

The concept of fitness in GP is usually a numerical value. The numerical value is calculated by assessing a given individual over some test cases. Fitness is very problem-independent, so it is difficult to talk about in general, but there are several common threads among fitness calculations as described below. Don't worry if you're a little confused, we will examine fitness as it relates to the three simple problems we described in the last section.

The first fitness concept is the most problem-independent, known as raw fitness. There are no constraints on raw fitness. It is simply an expression of the fitness in the natural terminology of the problem. Higher values might mean better individuals, or lower values might mean better values-- it all depends on the problem at hand.

Because raw fitness is so loosely-defined, GP rarely functionally uses it for anything. It is usually only used to provide feedback to the researchers working with the GP run. Rather, the convention in GP is to convert raw fitness to standardized fitness. The only requirements for standardized fitness is that they cannot be negative and smaller values represent better individuals. It is customary, but not absolutely necessary to define an optimal individual as value 0. It might not be known exactly how an optimal individual would behave, particularly on a problem where the solution is not currently known, so this is not always possible. Standardized fitness must be defined by the requirements of the problem at hand.

Because standardized fitness is not range-bound, certain calculations in GP might become a little confusing. It is easy to tell whether one individual is better than another, but it is not always clear how much better. For this reason, we can convert standardized fitness to adjusted fitness. Adjusted fitness can be calculated according to the following Java code:

public double getAdjustedFitness(double standardizedFitness) {
	double denominator = 1 + standardizedFitness;
	return 1 / denominator;

For adjusted fitness, the values are always between 0 and 1, where higher values represent better individuals. This is very convenient. Note that if the value of the standardized fitness is zero (a perfect fitness), the adjusted fitness is one. Also note that if the value of the standardized fitness is infinity (a complete failure), the adjusted fitness is zero!

Now we can calculate fitness ratios and determine exactly how much better one individual is as compared to another in a consistent way. Still, there is one further fitness concept that we will find useful. The normalized fitness is calculated by adding up the adjusted fitness values for all individuals in the population and then dividing each adjusted fitness by that sum. The advantage to doing this is that the fitness ratios remain the same, but the sum of all normalized fitness values is one. This means that we can directly translate normalized fitness values into probability values, which is particularly conevnient when selecting individuals in a population.


Once you have the standardized fitness, the calculations remain the same for every problem. For this reason, when discussing GP problems, one only needs consider how raw and standardized fitnesses are measured. Let us examine how they are defined for our three simple problems.

Symbolic Regression

For the symbolic regression problem, our raw fitness measure would ideally measure how close a given curve is to fitting the data. One way of doing this is to consider the error. The error at a given data point is the vertical distance between the predicted point and the data point. Looking at the picture to the left, you can see that the green curve represents an attempted solution to the problem. The red lines show the residuals, or the error at each point.

A large amount of big residuals would mean a pretty badly-fit curve. However, we cannot simply add up all of these residuals. The convention in statistics is to add up the squares of all the residuals. This number is referred to as the residual sum of squares (original, huh?).

Conveniently, the raw fitness for the symbolic regression can also be the standardized fitness. If the set of data points is particularly large, calculating the residuals for every single point might not be computationally reasonable. For this reason, it is common to pick a number of points at random and calculate the residuals for those points only. There are other advantages to randomizing the fitness cases, but we will not consider them now.

The following Java code shows how the raw fitness can be calculated for the symbolic regression problem:

public double getRawFitness(SymbolicRegressionIndividual individual, 
							double[] xPoints, double[] yPoints) {
	double sum = 0;
	for(int i = 0; i < xPoints.length; i++) {
		double residual = yPoints[i] - individual.getValue(xPoints[i]);
		sum += residual * residual;
	return sum;

Cart Centering

Fitness for the cart centering problem requires us to think back to our days of high school physics. Using the simple kinematics equations, we can calculate how a given thrust will affect the cart's position. We can then run several simulations and add up the time it took on each trial to get the raw fitness. Luckily, raw fitness can also be our standardized fitness for this problem too. But, what initial values do you give the cart, how many times do you test it, and how do you handle individuals that never solve the problem. If you don't understand physics very well, I would move on to the next section as the following explanations are a little math-heavy.

The initial values are usually provided by range-bound random variables. In Koza's book, he defined the range of the initial velocity to be between -0.75 m/s and 0.75 m/s. For the initial position, the range is -0.75 m through 0.75 m. To make calculations easiest, he also defined the ratio of force-to-mass to equal one so that the acceleration from the thruster is always one meter/second2, but any value could have been used. With this range of values, an ideal individual can center the cart within 1-3 seconds from any initial value of velocity and position.

The number of times to test the individual can vart based on a number of factors, but 10-20 trials usually gets a good idea of how well an individual solves the problem.

Of course, especially in the begging, not all individuals will center the cart at all. Some randomly generated strategies may simply be to accelerate forward indefinitely, which would certainly never bring the cart to rest at the center. For this reason, we need a timeout value, usually 10 seconds. If an individual has not centered the cart in under 10 seconds, it is simply assigned a time of 10 seconds for that trial.

Of course, we don't expect the cart to come completely to rest exactly in the center, so we need to assign a margin of error, usually 0.01. If the velocity and position are ever both close to zero within the margin of error, we call it a success. We also don't want to be doing any complicated calculus, so we will be discretizing time into 0.02 second intervals.

The following Java code shows how the raw fitness can be calculated for the cart centering problem:

public double getRawFitness(CartCenteringIndividual individual) {
	double sum = 0;
	for(int i = 0; i < 20; i++) {
		double velocity = 1.5 * Math.random() - 0.75;
		double position = 1.5 * Math.random() - 0.75;
		double time = 0;
		while(time < 10 && (Math.abs(velocity) > 0.01 || Math.abs(position) > 0.01)) {
			//individual.getDecision(v, x) returns either 1 or -1
			int thrust = individual.getDecision(velocity, position);
			velocity += thrust * 0.02;
			position += velocity * 0.02;
		sum += time;
	return sum;

Santa Fe Ant Trail

The goal of the Santa Fe Ant is to consume as much food along the trail as possible. So it would be natural to consider the fitness to simply be the amount of food eaten in a simulation. Now, we want to avoid individuals that get a lot of food by simply examining every space on the board, so we will give some kind of time limit. Since there are 1,024 spaces on the grid and 89 pieces of food, we decide to limit the time to 400 time steps. We consider a timestep to pass every time the ant either moves or turns.

So here we have an easy way to define the raw fitness, amount of food eaten in the time limit. However, note that a perfect individual that eats all the food would receive a fitness score of 89, while an individual that eats no food receives a fitness of 0. This means that raw fitness cannot be used as standardized fitness, unlike the other problems. Instead, we will define standardized fitness as [89 - (food eaten)].

The actual implementation is a little more complex than the other two problems, so we will not discuss it here.

By now, you probably have a fairly firm grasp on the concept of fitness. You understand both of the inputs to a GP system. Now we will begin discussing how GP actually operates. The first step, as you may recall from the What Is Genetic Programming? section, is The Initial Population.