Saturday, April 13, 2013

75. Genetic Algorithms

Holland (1975) introduced the formalism of genetic algorithms (GAs) by analogy with how biological evolution occurs in Nature. Deep down under, a computer program is nothing but a string of 1s and 0s, something like 110101010110000101001001….. This is similar to how chromosomes are laid out along the length of a DNA molecule. We can think of each binary digit as a ‘gene’, and a string of such genes as a digital ‘chromosome’. For example, chromosome A may be 1101, B may be 0111, etc. In a GA, an individual in the population is represented schematically as the sequence of its chromosomes, say ABCDEF. Another individual may be abcdef, a third may be aBCdeF, etc.

The essence of Darwinian evolution is that, in a population, the fittest have a larger likelihood of survival and propagation. In computational terms, it amounts to maximizing some mathematical function representing 'fitness'.

It is important to remember that whereas Darwinian evolution is an open-ended and blind, process, GAs have a goal. GAs are meant to solve particular pre-conceived problems.

For solving a maximization problem (e.g. for finding the maximum possible value of a complicated function (say fitness) of an artificial genome), the steps involved are typically as follows:

1. We first let the computer produce a population of, say, 1000 individuals, each represented by a randomly generated digital chromosome.

2. The next step is to test the relative fitness of each individual (represented entirely by the corresponding chromosome) regarding its effectiveness in maximizing the function under consideration; e.g. the fitness function. A score is given for the fitness, say on a scale of 1 to 10. In biological terms, the fitness is a probabilistic measure of the reproductive success of the individual. The higher the fitness, the greater is the chance that the individual will be selected (by us) for the next cycle of reproduction.

3. Mutations are introduced occasionally in a digital chromosome by arbitrarily flipping a 1 to 0, or a 0 to 1.

4. The next step in the GA is to take (in a probabilistic manner) those individual digital chromosomes that have high levels of fitness, and produce a new generation of individuals by a process of digital sexual reproduction or crossover. The GA chooses pairs of individuals, say ABCDEF and abcdef, and produces two new individuals, say, ABCdef and abcDEF.

5. The new generation of digital individuals produced is again subjected to the entire cycle of gene expression, fitness testing, selection, mutation, and crossover.

6. These cycles are repeated a large number of times, till the desired optimization or maximization problem has been solved.

The sexual crossover in reproductive biology, as also in the artificial GA, serves two purposes. It provides a chance for the appearance of new individuals in the population which may be fitter than any earlier individual. Secondly, a provides a mechanism for the existence of clusters of genes (called building blocks) which are particularly well-suited for occurring together because they result in higher-than-average fitness for any individual possessing them.

Holland used the symbol # to denote ‘does not matter’; i.e. the corresponding digital gene may be either 0 or 1. Suppose it is found that certain patterns of genes turn out to have high fitness; e.g. 111##10####1#10, or 0##1000##1111##, etc. Such patterns function as building blocks with a high potential for fitness.

What is more, since the population can shuffle its genetic material in every generation through sexual reproduction, new building blocks, as well as new combinations of existing building blocks, can arise. Thus the GA quickly creates individuals with an ever-increasing number of ‘good’ building blocks (the 'bad' building blocks get gradually eliminated by natural selection). If there is a survival advantage to the population, the individuals that have the good building blocks spread rapidly, and the GA converges to the solution rapidly (a case of positive feedback).

By the mid-1960s, Holland had proved his fundamental schema theorem:

In the presence of reproduction, crossover and mutation, almost any compact cluster of genes that provides above-average fitness will grow in the population exponentially.

‘Schema’ was the term used by Holland for any specific pattern of genes (also see Nolfi and Floreano 2000).

Efficient drug design based on evolutionary principles is an example of GAs in action. Often, for a drug to be effective, its molecular structure should be such that it can fit snugly into a relevant cleft in a relevant protein molecule. It can be very expensive to actually synthesize all those trial drugs and test their compatibility with the protein-molecule cleft. In the GA approach the computer code generates billions of random 'molecules', which it tests against the cleft in the protein. One such imaginary molecule may contain a site which matches one of, say, six sites on the cleft. This molecule is 'selected', and a billion variations of it are created, and tested. So on.

Here is another example of application of GAs, this time from materials science. Giro, Cyrillo and Galvao (2002) designed conducting polymers by the GA approach. Copolymerization is one way of developing new polymers. Because of the huge number of possible configurations, and because of the ever-present possibility of structural disorder, the task becomes daunting when one wants to investigate theoretically the various ways of combining two or more monomers to design a copolymer with the desired properties. Giro et al. have described the successful implementation of the task of designing binary and ternary disordered copolymers of conducting polyaniline by employing GAs.