Pages

Saturday 31 August 2013

95. Zero-Sum Two-Player Games and the Minimax Theorem



In zero-sum two-player games a single number, i.e. the amount won by the first player, and therefore the amount lost by the second player (or vice versa), determines the payoff. If it is a finite game, its normal form is a matrix of payoffs, such as that shown below (Owen 1992).

1       3       5
2*      4       3
0       1      -3

The entry with a star is a 'saddle point' (see below).

Here each row corresponds to the strategies of player I, and each column to those of player II. Suppose player I chooses a strategy corresponding to row 2. The payoffs are 2, 4, or 3, depending on the strategy chosen by player 2. Thus, player I can be certain of winning at least 2 units. By contrast, if player I chooses row 1, what he can win at least is 1 unit. Similarly, if he chooses row 3, what he can win at the most is 1 unit, or he may lose as much as 3 units. Thus row-2 strategy is his 'maximin strategy' (it maximizes his minimum winnings), and is called the security level of this player. The security level of a player is the largest amount he can be sure of winning, no matter what games the other player plays.

Now consider player 2. For columns 1, 2, and 3, his maximum payoffs are 2, 4, and 5, respectively. Thus column 1 corresponds to his 'minimax strategy' (it gives the minimum of the three maximum payoffs).

In this example, 2 units is both the maximin and the minimax payoff. By choosing row 2, player I is sure to win at least 2 units, and by choosing column 1, player II is sure to lose no more than 2 units. It is a zero-sum game, and row 2 and column 1 are optimal strategies. A game of this type in which the maximin of one player coincides with the minimax of the other player has a saddle point in the payoff matrix; the saddle-point payoff is 2 units in our example.


Suppose we make a 3-dimensional plot of the payoff to player I as a function of the two strategies plotted along the other two axes. The maximin strategy will be seen at the valley in this plot. Now suppose we plot the payoff for player II as a function of the two strategies. The minimax strategy will show as a peak in this plot. When the maximin and the minimax coincide, we see a saddle point in the plot. If the two players can be presumed to be rational, they would both hit the saddle point without any need to communicate with each other, except through the consequences of the games played by each.

This is a notable result from the point of view of emergence of swarm intelligence and complex behaviour. Each bee in a beehive is obviously ‘rational’ in the sense that it has no emotions to speak of. Its actions are determined by the innate tendencies (simple rules) dictated by its genes. Although there is an apparent visual ‘communication’ with other bees (the waggle dance), the fact remains that even the response to the dance is nothing more than an operation of simple local rules in a rational manner. The existence of the saddle point, although arrived at above in the context of a zero-sum two-player game, points to the existence of a sustainable level of survival of the bee species, each bee playing the role of a rational player at the individual level.

It was von Neumann who showed in his pioneering work on game theory (cf. Part 91) that for two-person zero-sum games, the idea of rational behaviour can be extended by using the concept of individual rationality.

In general, the maximin and minimax strategies may not give the same payoff, and this is illustrated below in the payoff matrix for a rudimentary version of the normal form of an elementary game of poker (Owen 1995). Here each row corresponds to the strategies of player 1, and each column to those of player 2. Player 1 gets to choose from four possible strategies, and player II must choose from two possible strategies. Here the maximin is -0.6 in row 2, and the minimax is -0.2 in column 1. The difference is not zero, and represents an indeterminacy of the game.

-1.8     1
-0.2   -0.6
-2.2     1
-0.6   -0.6

In such situations it is advisable for the two players to adopt mixed strategies. The idea is to prevent the other player from guessing the strategy by choosing a row or a column according to some randomizing scheme. For example, suppose player 1 plays row 1 with probability 1/8, row 2 with probability 7/8, and does not play rows 3 and 4 at all. The four choices can be represented by a vector x = (1/8, 7/8, 0, 0), and give player 1 the winnings of at least -0.4. This means that he expects to lose no more than 0.4 units, irrespective of what game the opponent plays. Similarly, player 2 may be advised to play either column with a probability 1/2 (y = (1/2, 1/2)). The expected payoff for this approach is at the most 0.4.

With this mixed strategy, player 1 expects to lose no more than 0.4, and player 2 expects to win at least 0.4. The number -0.4 is the value of the game.

We can generalize. Consider a game in which the payoff matrix A is an m x n matrix. The mixed-strategy for player 1 is a vector x = (x1, x2, . . ., xm). The components of this vector, which are probabilities, are nonnegative, and add up to 1. For player 2 there is a similar mixed-strategy vector y = (y1, y2, ... yn). The expected payoff for mixed strategies used by the two players can be written as

P(x, y) = xAyt

The maximin in this general case is

VI = maxx miny xAyt

And the minimax is

VII = miny maxx xAyt

The minimax theorem

The minimax theorem says that for all matrix games,

VI = VII.

The common value of the payoff is called the value of the game. The maximization of x and the minimization of y are optimal mixed strategies.

Linear programming is the most common technique used for the computation of optimal strategies in a matrix game.

Chess is an example of a zero-sum two-player game. There is a theorem in game theory according to which any finite, zero-sum, two-player game like chess has an optimal solution. That is, there is an optimal way of choosing moves in chess that will make a player do better than any other set of moves. But the total possible number of moves in chess is huge: ~10120. The problem is that no one knows what that optimal solution is. One never plays the same game twice.

Suppose there are more than two players in a game; i.e. it is a plural game. In any game, whether dual or plural, the total payoff may be zero (zero-sum games) or nonzero (nonzero-sum games). For the latter category, the interests of the players may not be opposite always; i.e. both noncooperation and cooperation are conceivable. I shall focus on noncooperative games in the next post.

No comments:

Post a Comment