The Initial Population

To this point in the guide, you have built a base of knowledge of how individuals are represented and assessed in GP. We can now begin to look at how GP actually goes about evolving creative solutions to complex problems.

You may recall from the flowchart in the What is Genetic Programming? section that the first step to any GP run is to Randomly Generate Initial Population. While conceptually, this may be easy to grasp, but in practice, it is slightly more complex.

There are many ways to generate the initial population. The two most popular methods, used in most GP systems are called Full and Grow. Both methods generate a completely random tree with some predefined maximum depth.

The Full Method

The full method tends to produce, as the name implies, full/bushy trees. We begin by randomly choosing a function to be our root node. Recall that a function has one or more child nodes. We then recursively assign functions to the child nodes of the root node until the maximum depth has been reached, at which point we choose only terminals. Let's examine the code; an example step-by-step full generation of a tree of max depth 3 is show to the left.

public GPNode generateFull(int maxDepth) {
        GPNode root;
        
        //If we are not at the max depth, choose a function
        if(maxDepth > 1)
                root = NodeFactory.getFunction();
        //Otherwise, choose a terminal
        else
                root = NodeFactory.getTerminal();
        
        //Recursively assign child nodes
        for(int i = 0; i < root.numChildNodes(); i++)
                root.setChildNode(i, generateFull(maxDepth - 1));
        
        return root;
}  

As I mentioned before, trees generated with the full method tend to be larger than trees generated with the grow method, which I will soon describe. For this reason, they tend to be richer and more computationally complex. This can be beneficial in that it may be more effective at solving a problem, though typically more resource-intensive. Structurally, they tend to be more similar to one another than trees generated with grow. One thing to note is that the tree is guarunteed to have depth equal to the maximum depth. Let us examine the other tree generation method.

Grow Method

The grow method is very similar to the full method, in that it is a recursive random generation of a tree with maximum initial depth. The difference between grow and full is that the grow method does not require functions to be chosen at any point. If the maximum depth has not yet been reached, either a function or a terminal may be selected, causing more diverse tree structures with some branches longer than others. To the left is an example generation, similar to the one above, but using the grow method. The code for grow is shown below.

public GPNode generateGrow(int maxDepth) {
        GPNode root;
        
        if(maxDepth > 1)
                root = NodeFactory.getNode();
        //If we are at the max depth, choose a terminal
        else
                root = NodeFactory.getTerminal();
        
        //Recursively assign child nodes
        for(int i = 0; i < root.numChildNodes(); i++)
                root.setChildNode(i, generateGrow(maxDepth - 1));
        
        return root;
}  

Trees generated by the grow method tend to have more diverse structures, and tend to be less computationally complex. They also sometimes have a tendency to terminate prematurely, ie have a depth less than the maximum depth, particularly if the probability of selecting a node is high.

References in the two code segments above are made to a class called NodeFactory. For the purpose of the examples in this guide, NodeFactory is a class that simply selects which nodes to use in various instances. For example, a call to NodeFactory.getFunction() will return any of the function nodes available for the problem. Generally, GP chooses nodes with a uniform distribution, ie every node has an equal chance of being selected. This may not always be the case. If, for some reason, a Genetic Programmer knows that a certain node should appear more often than another, he might choose to assign that node a higher probability of selection. In this guide, we will not examine that case, but you should know that there are many ways to select the nodes we use. It is all up to the Genetic Programmer and varies by application.

Neither of the two tree generation methods described above is particularly better at solving all GP problems. The most commonly used method for initial population is referred to as ramped half-and-half. Half of the individuals are created using the full method and the other half are generated using the grow method. The maximum depth of the trees generated is ramped, such that individuals are created in a range of sizes. Using this method creates a diverse initial population, both in structure and computational complexity.

Another common technique to make initial population generation more effective is to eliminate duplicates in the initial population. This ensures that we are not reprocessing identical individuals, which is a waste of computational resources. It also increases diversity, which is usually beneficial.

So now that we have an initial population, we can begin manipulating it. The process begins with an assessment of the fitness of all individuals in the population. If you forget how fitness works, you can reread the Fitness section. Once the fitnesses are all assessed, we must choose which individuals survive and reproduce, and which ones must leave the population, Selection.