Holland (1975) introduced the formalism of genetic algorithms (GAs) by analogy
with how biological evolution occurs in Nature. Deep down under, a computer
program is nothing but a string of 1s and 0s, something like
110101010110000101001001….. This is similar to how chromosomes are laid out along the length of a DNA
molecule. We can think of each binary digit as a ‘gene’, and a string of such
genes as a digital ‘chromosome’. For example, chromosome A may be 1101, B may
be 0111, etc. In a GA, an individual in the population is represented
schematically as the sequence of its chromosomes, say ABCDEF. Another
individual may be abcdef, a third may be aBCdeF, etc.
The essence of
Darwinian evolution is that, in a population, the fittest have a
larger likelihood of survival and propagation. In computational terms, it
amounts to maximizing some mathematical function representing 'fitness'.
It is
important to remember that whereas Darwinian evolution is an open-ended and blind,
process, GAs have a goal. GAs
are meant to solve particular pre-conceived problems.
For solving a maximization
problem (e.g. for finding the maximum possible value of a complicated function
(say fitness) of an artificial genome), the steps involved are typically as
follows:
1. We
first let the computer produce a population of, say, 1000 individuals, each
represented by a randomly generated digital chromosome.
2. The
next step is to test the relative fitness of each individual (represented
entirely by the corresponding chromosome) regarding its effectiveness in
maximizing the function under consideration; e.g. the fitness function. A score
is given for the fitness, say on a scale of 1 to 10. In biological terms, the
fitness is a probabilistic measure of the reproductive success of the
individual. The higher the fitness, the greater is the chance that the
individual will be selected (by us) for the next cycle of reproduction.
3. Mutations
are introduced occasionally in a digital chromosome by arbitrarily flipping a 1
to 0, or a 0 to 1.
4. The
next step in the GA is to take (in a probabilistic manner) those individual
digital chromosomes that have high levels of fitness, and produce a new
generation of individuals by a process of digital sexual reproduction or crossover.
The GA chooses pairs of individuals, say ABCDEF and abcdef, and produces two
new individuals, say, ABCdef and abcDEF.
5. The
new generation of digital individuals produced is again subjected to the entire
cycle of gene expression, fitness testing, selection, mutation,
and crossover.
6. These
cycles are repeated a large number of times, till the desired optimization or
maximization problem has been solved.
The sexual
crossover in reproductive biology, as also in the artificial GA, serves two
purposes. It provides a chance for the appearance of new individuals in
the population which may be fitter than any earlier individual. Secondly, a
provides a mechanism for the existence of clusters of genes (called building blocks) which are particularly
well-suited for occurring together because they result in
higher-than-average fitness for any individual possessing them.
Holland used
the symbol # to denote ‘does not matter’; i.e. the corresponding digital gene may
be either 0 or 1. Suppose it is found that certain patterns of genes turn out
to have high fitness; e.g. 111##10####1#10, or 0##1000##1111##, etc. Such
patterns function as building blocks with a high potential for fitness.
What is more,
since the population can shuffle its genetic material in every generation
through sexual reproduction, new building blocks, as well as new combinations
of existing building blocks, can arise. Thus the GA quickly creates individuals
with an ever-increasing number of ‘good’ building blocks (the 'bad' building
blocks get gradually eliminated by natural selection). If there is a survival
advantage to the population, the individuals that have the good building blocks
spread rapidly, and the GA converges to the solution rapidly (a case of
positive feedback).
By the
mid-1960s, Holland had proved his fundamental schema theorem:
In
the presence of reproduction, crossover and mutation, almost any compact
cluster of genes that provides above-average fitness will grow in the population
exponentially.
‘Schema’ was
the term used by Holland for any specific pattern of genes (also see Nolfi and Floreano 2000).
Efficient drug design based on evolutionary principles is an
example of GAs in action. Often, for a drug to be effective, its molecular
structure should be such that it can fit snugly into a relevant cleft in a
relevant protein molecule. It can be very expensive to actually synthesize all
those trial drugs and test their compatibility with the protein-molecule cleft.
In the GA approach the computer code generates billions of random 'molecules',
which it tests against the cleft in the protein. One such imaginary molecule
may contain a site which matches one of, say, six sites on the cleft. This
molecule is 'selected', and a billion variations of it are created, and tested.
So on.
Here is another example of application of
GAs, this time from materials science. Giro, Cyrillo and Galvao (2002) designed
conducting polymers by
the GA approach. Copolymerization is
one way of developing new polymers. Because of the huge number of possible
configurations, and because of the ever-present possibility of structural
disorder, the task becomes daunting when one wants to investigate theoretically
the various ways of combining two or more monomers to design a copolymer with the
desired properties. Giro et al. have described the successful
implementation of the task of designing binary and ternary disordered
copolymers of conducting polyaniline by employing GAs.
No comments:
Post a Comment