Neural networks and genetic algorithms I
When we try to apply a neural network to a given problem, finding the most suitable topology for it can be a tedious trial and error task, as well as end up producing a poorly optimized network. To automate this process, we can draw on evolutionary algorithms, inspired in the natural selection of living organisms, which can greatly facilitate our job.
In this kind of procedure, we first define, in some way, elements that define how to generate the objects whose structure and properties we intend to optimize; we call genes to these elements. In the case of a neural network, one of these genes can define many parameters. These parameters are, for instance, the number of layers of neurons, the number of neurons in each layer, the type of functions we will use to activate the neurons and their parameters, or even the weights of the entries of each single neuron in the net. The more different parameters determined by the genes, the more complicated the process will be and the longer it will take, so is better for you to not to be too ambitious in the beginning.
Therefore, each gene will produce a neural network with a different configuration. This provide us a set, called population, of neural networks; we have to evaluate each one, in order to determine which of the configurations produce better results. This means that we will need to train each network with the training data set, and rank them in some way; for this, we define a cost function, which will return a single value as a result, the cost of the network. The target is to find a network whose cost is minimal. The calculation of the cost does not have fixed rules; it will depend on our preferences and the nature of the problem we are dealing with. If we are interested in a network with the least possible number of neurons, we will use this value to penalize the cost function, we can use the number of hidden layers too. Other values of interest to use in the calculation are those related with the network errors: minimum, maximum, median and mean errors, standard deviation of the errors, or rate of failures on the training sample. We will assign a weight to each of the values used in the calculation, depending on the importance given to it, and we will finally sum all of them to obtain the final cost of the network.
In the first iteration of our system, we generate randomly a gene population. Then, we evaluate each gene, so that we can order them from lowest to highest cost. This is what is known as the first generation, but the algorithm does not end here; as in the reproduction of living beings, in which this kind of procedures are inspired, we must select the best individuals and obtain a new generation, preserving that better ones and discarding the worse. For this, we can perform different operations among the current genes to obtain new ones: crossing and mutation.
Gene crossing is the procedure of taking two different genes and generating another one that contains part of both. To do this, we decided by means of some rule which parameter values of the first and second gene we will use to construct the new one. For example, if the genes define the number of neurons in each of the six layers of a given neural network, we could decide to create a new gene with the values of the first three layers taken from one gene, and those of the last three taken from the other gene. This operation can follow fixed rules or to be decided according to some random value. There is no a priori criterion, although it is convenient to ensure a certain diversity in the genes of the new generations.
Mutation consists of modifying some of the parameters in a gene, either in a small amount or changing it to a completely different value. Normally, we will perform this operation only on a small number of occasions, with a low probability, but the rules are not fixed, and they depend on the particular issue you are dealing with. Mutation helps to diversify the population by trying new slightly different configurations.
You can perform these two operations among either the best genes, the best ones and the rest of the population, or randomly selected genes. We will conserve the best individuals of the current generation and eliminate the worst ones, so that, at the end, we have a population with the same number of individuals as the initial one. With this new population, we will repeat the process, and so on. As I said before, it is important to have enough genetic diversity in the population, since the objective of this type of algorithm is to explore a space of solutions that is usually very large. It is possible for a poorly designed system to end up being limited to the exploration of only a small part of this space. It may be useful to create some genes completely at random, as in the first generation, in order to introduce entirely new genes into the population.
Obviously, this involves a lot of processing time. Even though we do not carry out an in-depth training of the network, it will be necessary to perform sufficient iterations (in the case of using the backpropagation algorithm, for example) in order to get reasonably accurate values to calculate the cost function. The size of the maximum population of networks that we will maintain and the training parameters should be selected bearing in mind the time available to obtain the results.
Since the system probably spend to train each network the minimum time that give us a rough idea of its efficiency, it is convenient to carry out a new estimation of all the gene's cost in each generation, even those that we have already evaluated in previous generations. The reason is that the efficiency that reaches the network can quite depend on the initial values provided for the neuron's weights, which are usually assigned at random. In this way, we improve the ability of the process to determine the best topologies and parameters at the cost of just a little more time.
As a reading to delve into the subject, I could recommend the book Genetic Algorithms, by K.F. Man, K.S. Tang and S. Kwong, which contains a chapter dedicated precisely to the training of neural networks using evolutionary algorithms. I have prepared a simple application as an example of implementation of this procedure, which I will explain in the second article in this series.