Saturday, 27 April 2013

77. Artificial Life

With the advent of artificial life, we may be the first creatures to create our own successors. . . If we fail in our task as creators, they may indeed be cold and malevolent. However, if we succeed, they may be glorious, enlightened creatures that far surpass us in their intelligence and wisdom. It is quite possible that, when conscious beings of the future look back on this era, we will be most noteworthy not in and of ourselves but rather for what we gave rise to. Artificial life is potentially the most beautiful creation of humanity (Doyne Farmer and Alletta Belin).

Christopher Langton is the main originator of the subject of artificial life. The term artificial life (AL, or Alife) was coined by him around 1970: AL is ‘. . an inclusive paradigm that attempts to realize lifelike behaviour by imitating the processes that occur in the development or mechanics of life.

In the more familiar field of artificial intelligence (AI) one uses computers to model neuropsychology. Likewise, in the field of AL one uses computers to model the basic biological mechanisms of evolution and life (Heudin 1999). In abstracting the basic life processes, the AL approach emphasizes the fact that life is not a property of matter per se, but the organization of that matter. The laws of life must be laws of dynamical form, independent of the details of a particular carbon-based chemistry that just happened to arise here on Earth. It attempts to explore other possible biologies in new media, namely computers and robots.

The idea is to view life-as-we-know-it in the context of life-as-it-could-be. There was a recent report of how Lee Cronin of the University of Glasgow could create lifelike cells out of metal.

In conventional biology one tries to understand life phenomena by a process of analysis: We take a living community or organism, and try to make sense of it by subdividing it into its building blocks. By contrast, AL takes the synthesis or bottom-up route. We start with an assembly of very simple interacting units, and see how they evolve under a given set of conditions, and how they change when the environmental conditions are changed.

One of the most striking characteristics of a living organism is the distinction between its genotype and phenotype. The genotype can be thought of as a collection of little computer programs, running in parallel, one program per gene. When activated, each of these programs enters into the logical fray by competing and/or cooperating with the other active programs. And, collectively, these interacting programs carry out an overall computation that is the phenotype. The system evolves towards the best solution of a posed problem.

By analogy, the term GTYPE is introduced in the field of AL to refer to any collection of low-level rules. Similarly, PTYPE means the structure and/or behaviour that results (emerges) when these rules are activated in a specific environment.

What makes life and brain and mind possible is a certain kind of balance between the forces of order and the forces of disorder. In other words, there should be an edge-of-chaos existence. Only such systems are both stable enough to store information, and yet evanescent enough to transmit it.

Life is not just like a computation; life literally is computation. And once we have consciously made a link between life and computation, an immense amount of computational theory can be brought in. For example, the question ‘Why is life full of surprises?’ is answered in terms of the undecidability theorem of computer science, according to which, unless a computer program is utterly trivial, the fastest way to find out what it would do (does it have bugs or not?) is to actually run it and see. This explains why, although a biochemical machine or an AL machine is completely under the control of a program (the GTYPE), it still has surprising, spontaneous behaviour in the PTYPE. It never reaches equilibrium, and there is perpetual novelty.

The computational aspect of the AL approach invokes the theory of complex dynamical systems (Wadhawan 2010). Such systems can be described at various levels of complexity, the global properties at one level emerging from the interactions among a large number of simple elements at the next lower level of complexity. The exact nature of the emergence is, of course, unpredictable because of the extreme nonlinearities involved.

Here are some websites devoted to artificial life:
Life arose from an interplay of the blind forces of Nature, with one thing leading to another. There was no 'Designer' involved. In 1986 Richard Dawkins published his famous book The Blind Watchmaker. The book also had the Blind Watchmaker Applet. The computer code mimics Nature's power of cumulative Darwinian natural selection in the evolution of life. Of course, it is not possible to simulate all the complex interactions at work in a real system in Nature. But the applet catches the essence of the processes by introducing slight variations in each generation of the population, the 'selection' being determined by the whims and fancies of the user. Here is a sampling of the kind of 'biomorphs' that get 'created':

Ray Kurzweil has predicted a future with direct brain-to-computer access and 'conscious' machines.

Truly exciting times are ahead! It's a jolly good idea to be a science-literate person.

Saturday, 20 April 2013

76. Genetic Programming: Evolution of Computer Programs

'Further research into intelligence of machinery will probably be very greatly concerned with "searches" ...' (Turing 1948).

'We cannot expect to find a good child-machine at the first attempt.  One must experiment with teaching one such machine and see how well it learns.  One can then try another and see if it is better or worse.  There is an obvious connection between this process and evolution, by the identifications' (Turing 1950).

Evolutionary searches through a phase space can be made, not only for finding solutions using a suitably written computer program ('genetic algorithm'), but also for evolving the computer programs themselves. Genetic programming (GP) is about making computers evolve their own algorithms. Genetic programming now routinely delivers high-return 'human-competitive 'machine intelligence'.

But why should this be attempted at all?

As I stated in Part 74, the computations occurring in the human brain are of a massively parallel nature. And ambitions about mimicking the human brain must therefore involve writing programs with parallel computing on a very large scale. But this is not possible. I quote Tom Ray:
The complexity of programming a massively parallel machine is probably beyond us. I don't think we will ever be able to write software that fully uses the capacity of parallelism.
And it is not just the question of mimicking the human brain. For any highly complex system (and technological world abounds in them), it is very difficult, well nigh impossible, to write efficient, versatile, and robust computer programs. The problem becomes particularly acute when a large amount of parallel programming is essential. It appears that the only hope lies in letting computer codes evolve, just as the solution of a problem by the use of a fixed program evolves in a genetic algorithm (GA).

We have to breed software, which can be turned on itself, ad infinitum, so that it evolves to a desired end.

Such breeding of software becomes a necessity when we are trying to solve some extremely difficult problems in technology development. An example is the development of a multimillion-line program for automatic flying of aircraft. This is an example of wanting our machines to solve problems we do not know how to solve, but merely know how to state. It is impossible to make such huge programs bug-free. And just one bug may be enough to cause the plane to crash! We do not want that to happen.

Learning from ant colonies is one way to proceed. But a more general way is to attempt GP.

Here are some characteristics of GP for massively parallel computer codes:
  • There is an absence of central command. After all, no human is doing the coding. Only the problem is stated, and some 'boundary conditions' are specified.
  • The massiveness and distributedness of such codes ensures that even if there is a bug or a local crash, the system carries on, after some minor readjustment.
  • There is incremental expansion, followed by testing at each such 'modular' stage.
  • Parasites, viruses, adversaries are deliberately self-introduced at the testing stage, so that there is a built-in ability / experience of such eventualities. This is the equivalent of vaccination in living beings.
Naturally, such codes develop the all-important robustness feature. This is crucial when one is trying to develop, say, a code for flying a fully computer-controlled aircraft. We cannot afford a programming failure. [Think of the beehive. There is enough redundancy that even if a portion of the hive or population is decimated, the complex adaptive superorganism makes adjustments, and carries on regardless.]

Typically, one begins with a population of competing computer programs. Variations (including mutations) are introduced, as also the equivalent of crossover in biological reproduction (cf. Part 75). The variations are made heritable. Fitness criteria are defined, and the individual programs are made to compete for survival. The fitter programs are assigned a larger chance of surviving and reproducing in the next cycle of the evolutionary operation.

Each of the competing algorithms is normally a sequence of commands. What is being attempted, in effect, is that during the process of evolution the sequence of commands is shuffled and the commands are tempered with in other ways. This process is continued till the right combination of everything is achieved.

You might be wondering how can we tinker with a computer program and still hope that it would be functional. Even the slightest of errors in writing, say, a Fortran code can make it impossible to run.

Yes; conventional algorithms are fragile. Even a minor crossover or mutation of any command in an algorithm can really be a mutilation, resulting in a totally nonsensical or illegal statement, making the computer program to crash. In principle one could get over this problem by assigning zero fitness to all illegal algorithms, but then most of the members of the population would be declared unfit, and the evolution process would come to a halt because it would not have the all-important benefit of continual variety.

The way to make computer programs less fragile lies in giving them a tree structure, rather than a sequential structure. The tree structure can make the algorithm robust enough to withstand the depredations of crossover and mutation etc. with a certain degree of resilience. More details can be found in Sipper (2002), and Konar (2005).


The figure below is a program for the function f(X) = (2.2 – (X/11)) + (7 * cos(Y)).

And here is an example of how crossover between two parent GPs is carried out: 

Such evolutionary or 'bio-inspired' computing has also been used for evolving sophisticated analogue electrical circuits.

'Gene-expression programming' is an important variation on GP.

Saturday, 13 April 2013

75. Genetic Algorithms

Holland (1975) introduced the formalism of genetic algorithms (GAs) by analogy with how biological evolution occurs in Nature. Deep down under, a computer program is nothing but a string of 1s and 0s, something like 110101010110000101001001….. This is similar to how chromosomes are laid out along the length of a DNA molecule. We can think of each binary digit as a ‘gene’, and a string of such genes as a digital ‘chromosome’. For example, chromosome A may be 1101, B may be 0111, etc. In a GA, an individual in the population is represented schematically as the sequence of its chromosomes, say ABCDEF. Another individual may be abcdef, a third may be aBCdeF, etc.

The essence of Darwinian evolution is that, in a population, the fittest have a larger likelihood of survival and propagation. In computational terms, it amounts to maximizing some mathematical function representing 'fitness'.

It is important to remember that whereas Darwinian evolution is an open-ended and blind, process, GAs have a goal. GAs are meant to solve particular pre-conceived problems.

For solving a maximization problem (e.g. for finding the maximum possible value of a complicated function (say fitness) of an artificial genome), the steps involved are typically as follows:

1. We first let the computer produce a population of, say, 1000 individuals, each represented by a randomly generated digital chromosome.

2. The next step is to test the relative fitness of each individual (represented entirely by the corresponding chromosome) regarding its effectiveness in maximizing the function under consideration; e.g. the fitness function. A score is given for the fitness, say on a scale of 1 to 10. In biological terms, the fitness is a probabilistic measure of the reproductive success of the individual. The higher the fitness, the greater is the chance that the individual will be selected (by us) for the next cycle of reproduction.

3. Mutations are introduced occasionally in a digital chromosome by arbitrarily flipping a 1 to 0, or a 0 to 1.

4. The next step in the GA is to take (in a probabilistic manner) those individual digital chromosomes that have high levels of fitness, and produce a new generation of individuals by a process of digital sexual reproduction or crossover. The GA chooses pairs of individuals, say ABCDEF and abcdef, and produces two new individuals, say, ABCdef and abcDEF.

5. The new generation of digital individuals produced is again subjected to the entire cycle of gene expression, fitness testing, selection, mutation, and crossover.

6. These cycles are repeated a large number of times, till the desired optimization or maximization problem has been solved.

The sexual crossover in reproductive biology, as also in the artificial GA, serves two purposes. It provides a chance for the appearance of new individuals in the population which may be fitter than any earlier individual. Secondly, a provides a mechanism for the existence of clusters of genes (called building blocks) which are particularly well-suited for occurring together because they result in higher-than-average fitness for any individual possessing them.

Holland used the symbol # to denote ‘does not matter’; i.e. the corresponding digital gene may be either 0 or 1. Suppose it is found that certain patterns of genes turn out to have high fitness; e.g. 111##10####1#10, or 0##1000##1111##, etc. Such patterns function as building blocks with a high potential for fitness.

What is more, since the population can shuffle its genetic material in every generation through sexual reproduction, new building blocks, as well as new combinations of existing building blocks, can arise. Thus the GA quickly creates individuals with an ever-increasing number of ‘good’ building blocks (the 'bad' building blocks get gradually eliminated by natural selection). If there is a survival advantage to the population, the individuals that have the good building blocks spread rapidly, and the GA converges to the solution rapidly (a case of positive feedback).

By the mid-1960s, Holland had proved his fundamental schema theorem:

In the presence of reproduction, crossover and mutation, almost any compact cluster of genes that provides above-average fitness will grow in the population exponentially.

‘Schema’ was the term used by Holland for any specific pattern of genes (also see Nolfi and Floreano 2000).

Efficient drug design based on evolutionary principles is an example of GAs in action. Often, for a drug to be effective, its molecular structure should be such that it can fit snugly into a relevant cleft in a relevant protein molecule. It can be very expensive to actually synthesize all those trial drugs and test their compatibility with the protein-molecule cleft. In the GA approach the computer code generates billions of random 'molecules', which it tests against the cleft in the protein. One such imaginary molecule may contain a site which matches one of, say, six sites on the cleft. This molecule is 'selected', and a billion variations of it are created, and tested. So on.

Here is another example of application of GAs, this time from materials science. Giro, Cyrillo and Galvao (2002) designed conducting polymers by the GA approach. Copolymerization is one way of developing new polymers. Because of the huge number of possible configurations, and because of the ever-present possibility of structural disorder, the task becomes daunting when one wants to investigate theoretically the various ways of combining two or more monomers to design a copolymer with the desired properties. Giro et al. have described the successful implementation of the task of designing binary and ternary disordered copolymers of conducting polyaniline by employing GAs.