Three Simple Problems

Throughout this guide, we will occasionally refer to GP in general, but as this can be confusing, it is easier to discuss it as it relates to specific problems. For that reason, we are going to introduce three simple problems. Some of these are classic exercises in simple machine learning. These three examples appeared in Koza's 1992 Book and are used throughout GP literature as simple benchmark problems.

Symbolic Regression

If you recall anything about high school algebra, you probably remember linear regression. Linear regression was fitting a line model to a set of data. The model produced took the form f(x) = ax + b, where a was the slope and b was the y-intercept. Linear regression is fairly easy and fairly useful for data that may exhibit a linear relationship, but what if the relationship is unknown, as in the image to the left.

This data seems to produce some kind of curve, but the form of the curve is unknown. It could be parabolic, exponential, or something altogether more complicated. Traditionally, a scientist can use past knowledge of the data to model it appropriately, but what if we don't have this information? What if we want to have the computer find a model for the data on its own?

This is where symbolic regression comes in. Symbolic regression refers to fitting a model to the data, without defining the form of the solution ahead of time. A system which performs symbolic regression will determine both the form and the coefficients on its own. This is a classic application of GP and GP has proven its effectiveness on this problem in almost every implementation.

Applying GP to symbolic regression, we could simply input a set of data points and get a function that fits the data appropriately, such as f(x) = x3 + 2 x2 - x + 1.


Suppose you were in a small cart rolling along a straight path at some speed. Your only means of controlling the cart are to fire your thrusters forwards or backwards. You want to stop the cart at the center of the path as quickly as possible. To do so, you would need to analyze your current speed and position and determine the optimal thruster direction using some mathematical formulae. Mathematicians have done this and discovered a time-ideal solution to the problem. But can a computer compete with man here?

...and the answer is yes! This is one problem GP solves quite easily. Evolving individuals to solve this problems means coming up with a decision strategy. Based on the current velocity and position, the individual makes a decision to accelerate forwards or backwards.

This is a basic example of decision-making automaton. The individuals must make decisions given environmental information. Though the problem is rather simple, the concepts can be extended to making all kinds of decisions, taking into account a wealth of information. For instance, we might consider expanding this problem to multiple dimensions or incorporating an uneven track.

Santa Fe Ant Trail

In wild populations, insect populations sometimes exhibit very deterministic behavior. It seems that ants are able to handle everything that comes their way with simple sets of rules and tendencies. Let us simplify the problem further, and consider a single ant navigating a trail of food. The trail is not continuous, there are occasional gaps at particularly uneven locations. To make it even harder for the ant, the ant's only sense is whether or not there is food directly in front of it.

The picture to the left shows the Santa Fe Ant Trail. The Santa Fe Ant Trail problem is used throughout work in artificial intelligence and has a number of known solutions. In the picture, the white dot represents the start of the trail and the red dot represents the end. Pink square represent grid positions with food and green square represent empty grid positions.

Evolving an individual that solves this problem is another exercise in decision-making automaton. The individuals receive environmental information of whether or not there is food directly in front of it and can move forward, turn left, turn right, or perform any combination of these moves. The goal is to eat as much food as possible in a given amount of time. This is another problem that GP has proven successful in tackling, which makes it great for an introductory work on GP.

Throughout this guide, we will be referring to these simple problems with regards to various aspects of GP. It is important to note that these problems are intentionally very simple so that GP runs do not need to be particularly large to produce high quality results. You will learn throughout this guide about run sizes and you will find that using GP, these problems can be solved in minutes on a home computer, whereas truly useful problems usually require supercomputer clusters running day and night to solve.

Now that you understand these three simple problems, let us move on to how GP represents individuals, Representation.