Pages

Saturday 31 August 2013

95. Zero-Sum Two-Player Games and the Minimax Theorem



In zero-sum two-player games a single number, i.e. the amount won by the first player, and therefore the amount lost by the second player (or vice versa), determines the payoff. If it is a finite game, its normal form is a matrix of payoffs, such as that shown below (Owen 1992).

1       3       5
2*      4       3
0       1      -3

The entry with a star is a 'saddle point' (see below).

Here each row corresponds to the strategies of player I, and each column to those of player II. Suppose player I chooses a strategy corresponding to row 2. The payoffs are 2, 4, or 3, depending on the strategy chosen by player 2. Thus, player I can be certain of winning at least 2 units. By contrast, if player I chooses row 1, what he can win at least is 1 unit. Similarly, if he chooses row 3, what he can win at the most is 1 unit, or he may lose as much as 3 units. Thus row-2 strategy is his 'maximin strategy' (it maximizes his minimum winnings), and is called the security level of this player. The security level of a player is the largest amount he can be sure of winning, no matter what games the other player plays.

Now consider player 2. For columns 1, 2, and 3, his maximum payoffs are 2, 4, and 5, respectively. Thus column 1 corresponds to his 'minimax strategy' (it gives the minimum of the three maximum payoffs).

In this example, 2 units is both the maximin and the minimax payoff. By choosing row 2, player I is sure to win at least 2 units, and by choosing column 1, player II is sure to lose no more than 2 units. It is a zero-sum game, and row 2 and column 1 are optimal strategies. A game of this type in which the maximin of one player coincides with the minimax of the other player has a saddle point in the payoff matrix; the saddle-point payoff is 2 units in our example.


Suppose we make a 3-dimensional plot of the payoff to player I as a function of the two strategies plotted along the other two axes. The maximin strategy will be seen at the valley in this plot. Now suppose we plot the payoff for player II as a function of the two strategies. The minimax strategy will show as a peak in this plot. When the maximin and the minimax coincide, we see a saddle point in the plot. If the two players can be presumed to be rational, they would both hit the saddle point without any need to communicate with each other, except through the consequences of the games played by each.

This is a notable result from the point of view of emergence of swarm intelligence and complex behaviour. Each bee in a beehive is obviously ‘rational’ in the sense that it has no emotions to speak of. Its actions are determined by the innate tendencies (simple rules) dictated by its genes. Although there is an apparent visual ‘communication’ with other bees (the waggle dance), the fact remains that even the response to the dance is nothing more than an operation of simple local rules in a rational manner. The existence of the saddle point, although arrived at above in the context of a zero-sum two-player game, points to the existence of a sustainable level of survival of the bee species, each bee playing the role of a rational player at the individual level.

It was von Neumann who showed in his pioneering work on game theory (cf. Part 91) that for two-person zero-sum games, the idea of rational behaviour can be extended by using the concept of individual rationality.

In general, the maximin and minimax strategies may not give the same payoff, and this is illustrated below in the payoff matrix for a rudimentary version of the normal form of an elementary game of poker (Owen 1995). Here each row corresponds to the strategies of player 1, and each column to those of player 2. Player 1 gets to choose from four possible strategies, and player II must choose from two possible strategies. Here the maximin is -0.6 in row 2, and the minimax is -0.2 in column 1. The difference is not zero, and represents an indeterminacy of the game.

-1.8     1
-0.2   -0.6
-2.2     1
-0.6   -0.6

In such situations it is advisable for the two players to adopt mixed strategies. The idea is to prevent the other player from guessing the strategy by choosing a row or a column according to some randomizing scheme. For example, suppose player 1 plays row 1 with probability 1/8, row 2 with probability 7/8, and does not play rows 3 and 4 at all. The four choices can be represented by a vector x = (1/8, 7/8, 0, 0), and give player 1 the winnings of at least -0.4. This means that he expects to lose no more than 0.4 units, irrespective of what game the opponent plays. Similarly, player 2 may be advised to play either column with a probability 1/2 (y = (1/2, 1/2)). The expected payoff for this approach is at the most 0.4.

With this mixed strategy, player 1 expects to lose no more than 0.4, and player 2 expects to win at least 0.4. The number -0.4 is the value of the game.

We can generalize. Consider a game in which the payoff matrix A is an m x n matrix. The mixed-strategy for player 1 is a vector x = (x1, x2, . . ., xm). The components of this vector, which are probabilities, are nonnegative, and add up to 1. For player 2 there is a similar mixed-strategy vector y = (y1, y2, ... yn). The expected payoff for mixed strategies used by the two players can be written as

P(x, y) = xAyt

The maximin in this general case is

VI = maxx miny xAyt

And the minimax is

VII = miny maxx xAyt

The minimax theorem

The minimax theorem says that for all matrix games,

VI = VII.

The common value of the payoff is called the value of the game. The maximization of x and the minimization of y are optimal mixed strategies.

Linear programming is the most common technique used for the computation of optimal strategies in a matrix game.

Chess is an example of a zero-sum two-player game. There is a theorem in game theory according to which any finite, zero-sum, two-player game like chess has an optimal solution. That is, there is an optimal way of choosing moves in chess that will make a player do better than any other set of moves. But the total possible number of moves in chess is huge: ~10120. The problem is that no one knows what that optimal solution is. One never plays the same game twice.

Suppose there are more than two players in a game; i.e. it is a plural game. In any game, whether dual or plural, the total payoff may be zero (zero-sum games) or nonzero (nonzero-sum games). For the latter category, the interests of the players may not be opposite always; i.e. both noncooperation and cooperation are conceivable. I shall focus on noncooperative games in the next post.

Saturday 24 August 2013

94. The Ultimatum Game and the Economics of Fair Play



The real-life experience with the Traveller's Dilemma (TD) game I discussed in Part 93 results in two conclusions:

1. The most ‘efficient’ game for the two players to play, that would give the maximum total payoff, is that each chooses 100, giving a payoff (100, 100); i.e. 200. This goes against the grain of classical economics, which was based on the ‘libertarian presumption’ that if individuals are left to their own selfish devices, the economy would run efficiently.

2. People do not always tend to play the Nash-equilibrium strategy, which in the TD game will give a payoff (2, 2). This violates the assumption of classical economy that people take rational decisions.

Thus people do not always make selfish rational choices. The ultimatum game (UG) illustrates this point further.


Imagine Mr. A and Mr. B, not known to each other, and in separate rooms, unable to communicate with each other. Somebody plays the ultimatum game with them: He offers $100, which must be shared between them according to certain strict rules. The rules are known to both Mr. A and Mr. B. One of them, chosen at random, has to decide how the money is to be divided and makes this offer to the other person. If the other person accepts the offer, the deal is closed. If he rejects the offer, none of them gets anything. What amount will be offered by the person who has been chosen at random to make the offer?

Our first response may be: 50%. But is this what really happens in practice? Suppose Mr. A has been chosen to make the offer to Mr. B. He may very well think that even if he offers 40%, Mr. B is going to accept it, because $40 is much better than getting nothing at all. But then why not offer just $30? How about $20, 10, … , or even $1? Will Mr. B accept $1 dollar, which is, after all, better than getting nothing at all, even though Mr. A walks away with $99?


Experimentally, it is found that two-thirds of the offers are between 40 and 50% (Sigmund, Fehr and Nowak 2002). Only 4% people offer less than 20% money; offering such a small share is considered risky because the offer may get rejected.  More than 50% of all responders reject offers that are less than 20%. But the question is: Why should anyone reject an offer as ‘too small’? Obviously, factors such as sense of fair play and/or sense of outrage are at play.

The economics of fair play


Analyses based on Darwinian evolution demonstrate that fairness evolves if the proposer has some information about what deals the responder has accepted in the past (Nowak, Page and Sigmund 2000). Thus, evolution of a sense of fair play, like the evolution of cooperation (Maynard Smith and Price 1973), is linked to reputation. The opposite is also true. If an individual has the reputation of being content with low offers, he is more likely to be offered small shares. [Reminds me of something I learnt early in life: Often you have to train the people around you about how they should behave with you.]

Theoretical classical economics modelled a human being as Homo economicus: a rational and purely selfish individual. Studies on the UG and its variants reveal, however, that humans actually are a cross between H. economicus and H. emoticus. Emotions like generosity, revengefulness, altruism can override purely logical and selfish pursuits sometimes. How did this trait evolve in the Darwinian sense? There must have been some evolutionary advantage in this for such genes to survive and propagate.


Yes indeed. Unfair play arouses acts of moral indignation and revenge, which can be costly to the social group or the species. Fair play and altruism may not appear to be of immediate benefit to the individual, but can have long-term benefits to the group or the species. Over several generations, those genotypes can evolve for which the phenotype is such that, in pairwise encounters, a purely self-centred strategy is not adopted; instead, due account is taken of the other player’s conduct. A player is not interested only in his own payoff, but compares it with that of the other player, and demands fair play.

Such departures from the classical game-theoretic predictions of the UG have been rationalized in various ways. The chief among them is the premise that the strict rules prescribed by the UG are far too removed from what actually happens when real people deal with one another. For example, haggling is not permitted in the UG, whereas it must have been a common theme in the social life of our ancestors. Therefore, real players are likely to forget or ignore that the UG is a one-off game (the deal is either on or off), with no bargains or second chances.

Another factor to consider is the existence of rather strongly knit groups within a species. Such groups provide a sense of belonging, as also the assurance of protection against enemies or rival groups. Naturally, it makes sense not to outcompete or injure other members of one's own group to such an extent that successful contests with other groups are jeopardized. But while this line of reasoning does explain why proposers offer large sums, it cannot explain why low offers are rejected by responders.

Another approach softens the strict condition of the UG that there is complete secrecy, and that a contestant has no access to information about the other contestant (cf. Sigmund, Fehr and Nowak 2002). The fact is that our social evolution has been such that we expect that others will notice our actions and decisions. For example, suppose a player is known to become angry and reject a low offer, others are likely to make him high offers. Therefore, emotional responses to low offers must have been favoured by evolution. In due course, self-esteem entered the picture, making it necessary for a player to reject a low offer ‘as a matter of principle,’ making it a matter of dignity.

Other factors which influence the outcome of the UG, and of some of the ‘Public Goods’ games which include a punishment rule, are emotions like a sense of revenge, and satisfaction of punishing selfish action (Sigmund, Fehr and Nowak 2002). Such responses have evolutionary underpinnings which ensure fitness advantage for the social group. In times of war, pestilence, or famine, an above-average proportion of punishers helps the group to survive as a whole. Moral guidelines are an offshoot of such tendencies, and they apply to the economic affairs of the community as well.

Saturday 17 August 2013

93. The Traveller's Dilemma



'There is something rational about choosing not to be rational when playing Traveller’s Dilemma' (Kaushik Basu 2007).

The Traveller’s Dilemma (TD) game was crafted in 1994 by Kaushik Basu. Lucy and Pete go to a remote island in the Pacific, where they purchase two identical antiques. The antiques get damaged during their return journey, and the airline manager offers to compensate, except that he has no idea about the actual price. To make sure that the airline does not have to pay more than what is necessary, he comes up with a game plan. He asks each of them to write down simultaneously and without consulting each other the price of each antique as any integral number of dollars between 2 and 100, with the following proviso: If both write the same number, he will pay that amount to each of them. But if they write different numbers, he would conclude that the lower price is the correct one, and that the other person is trying to cheat. Therefore, the person writing the lower price will be rewarded for honesty with an extra two dollars, and the other person will be punished with a payment that is two dollars less than the lower of the two prices submitted by the travellers. For example, if Lucy writes $54 and Pete writes $100, then Lucy will be paid $56, and Pete will get $52. What will the two travellers write to maximize their gains? That is the Traveller’s Dilemma.


Although most people will choose a number 100 or one close to 100, logic demands that both players should choose 2. The actual price of each antique was considerably lower than $100, but let us first suppose that Lucy decides to write $100, hoping that Pete would be similarly greedy. Lucy’s logic is that if they both wrote $100, they would both get $100. But she soon realizes that she can get a little more money by writing 99, because she would then get $101 if Pete has written $100. But the same thought can occur to Pete also, in which case Lucy would be better off by writing 98. She would then get $100, if Pete wrote 99 or 100. But once again, if this idea occurs to Pete also, Lucy should be writing $97. This kind of logic (called backward induction) can go on till both should write $2. So if both players in the TD game are rational, their joint payoff should be (2, 2). The table below (from Basu 2007) gives the complete payoff matrix for this two-player strategic game.


2
3
4
98
99
100
2
2  2
4  0
4  0
4  0
4  0
4  0
3
0  4
3  3
5  1
5  1
5  1
5  1
4
0  4
1  5
4  4
6  2
6  2
6  2
98
0  4
1  5
2  6
98  
98
100   96
100   96
99
0  4
1  5
2  6
 96  100
  99   99
101   97
100
0  4
1  5
2  6
 96  100
  97  101
100  100

The leftmost column in this table gives Lucy’s choices in dollars, and the top row those of Pete. The square where a chosen row and a chosen column intersect shows the payoffs of Lucy and Pete respectively. For example if Lucy chooses 4 and Pete chooses 98, the payoff to Lucy is $6 and that to Pete is $2.

The payoffs (2, 2) when both players choose 2 represent Nash equilibrium: Any unilateral deviation from the Nash choices gives a lower payoff. For example, if Lucy stays at the equilibrium by choosing 2, then Pete does worse by choosing any number other than 2. To give an example of an outcome that is not a Nash equilibrium, consider (100, 100). Lucy will be better off by choosing 99, because then the outcome will be (101, 97), implying that she would do better by deviating from (100, 100).

We note that if each player in the TD has a choice of only 2 or 3, instead of every integer from 2 to 100, we get a payoff table similar to the table for the Prisoner's Dilemma (PD) I gave in Part 92. However, unlike in PD, there is no dominant choice (cf. Part 92) in the full version of the TD depicted in the above table. If Pete chooses any number from 4 to 100, Lucy would do well to choose a number greater than 2.

The existence of the Nash equilibrium in the TD game presents a logical paradox. Although it is a rationally correct solution, most of us would choose a number much higher than 2. It appears that our intuition contradicts the tenets of game theory.

Classical economic theory was based on the axiom that people make choices that classical game theory can predict, and classical game theory assumes that people are selfish and rational. But in practice, as demonstrated by several experiments in which real money was used for participating persons, and in which a variety of magnitudes of rewards and punishments were tested (Basu 2007), people do not play the Nash strategy on average. For low rewards, the average choice of numbers was high, and it fell somewhat when the rewards were increased. Repeated playing of the TD threw up some surprises as well. For high rewards, the play converged over time down to the Nash-equilibrium value. But for low rewards, the repeated play went in the other direction, i.e. more and more away from Nash equilibrium. Such experiments come under the relatively new field of research called experimental economics.

Why do people behave in such an illogical irrational manner? Why do they not follow the rational path to Nash equilibrium? Several reasons have been advanced (Basu 2007):

1. Many people are not capable of deductive reasoning, and end up making illogical, impulsive, irrational moves. But then why do even game-theorists, who presumably do not belong to this category, and who know how to reason deductively, end up making irrational choices? It turned out in real experiments that they chose high numbers because they expected other participating game-theorists to choose high numbers. Why?

2. Perhaps there is a tussle between altruism and selfishness in our brains, with simple-minded Darwinism favouring selfishness.

3. Perhaps many of us do not like to ‘let down’ our fellow traveller in the TD game just for the sake of getting rewarded an additional dollar. So we may choose 100, knowing fully well that choosing 99 can give us $101.

This is really an unsolved problem. Even if we eliminate factors such as faulty reasoning, altruism, and socialization, it is still doubtful if people would play logically and ruthlessly. People don't always make selfish rational choices (Ball 1999).