Saturday, August 10, 2013

92. Strategic Games: the Prisoner's Dilemma

Unlike in the extensive form of games I discussed in Part 91, the strategic or 'normal' form of games does not involve details like 'positions' and 'moves'. Rather, we speak in terms of 'strategy' and 'payoff'. The best known example of such a game is chess, but there are many other examples as well.

One specifies a strategy set (also called action space) Ai for each player i (i = 1, 2, . . . n). Thus, each player can choose a strategy from this possible set of strategies, after considering all the players and their strategies. All players make their choices of strategy simultaneously. At the end of the game, each player receives some payoff. The choice of strategy made by each player may influence the final outcome for all the players.

If the players are assumed to be rational, they make choices which result in the outcome that they prefer the most, given what their opponents do. For example, a player may have two strategies A and B such that, given any combination of strategies of the other players, the outcome from strategy A is better than the outcome from B. Then strategy A is said to dominate B. A rational player will always play a dominant strategy if possible.

The payoffs are generally modelled as having numerical values. For player 1, the strategy set is, say, A1. Suppose he/she chooses strategy a1 from this set. Similarly, any player i may choose a strategy ai from the set Ai. Then the payoff to a player j (j = 1, 2, . . . n) is a function fj(a1, a2, . . . an), called the payoff function for player j.

The strategic form of a game is defined by the set N of players, the sequence A1, A2, ... An of the set of strategies of the players, and the sequence f1(a1, a2, ... an), f2(a1, a2, ... an), ... fn(a1, a2, ... an) of the set of payoff functions for the n players.

A game in strategic form is called a zero-sum game if the sum of the payoffs to all the players adds up to zero.

Two-player zero-sum games are the best investigated. Each such game has a value, and the two players have optimal strategies that guarantee the value. Because of the minimax theorem, such games have clear-cut solutions.

Two-person non-zero-sum games can be tricky business. Usually such games do not have values or optimal strategies for the players. Two types of theoretical approaches can be used for dealing with such games: Noncooperative theory, and cooperative theory.

In the noncooperative form of the game, the players may not form binding agreements. This can happen, for example, during negotiations between sovereign nations when there is no overseeing body to enforce agreements. Similarly, in business dealings, trading companies may be forbidden by law to enter into certain types of agreements.

John Nash, John Harsanyi and Reinhard Selten were awarded the Nobel Prize in economics in 1995 for work in this area. In their formalism, the concept of strategic equilibrium or Nash equilibrium replaces value and optimal strategy: a group of players is said to be in Nash equilibrium if each of them is making the best decision that he or she can, taking into account the decisions of the others.

In cooperative game theory (the other approach to non-zero-sum games), the players can form binding agreements. Thus there is a strong incentive to cooperate for maximizing the total payoff. But now the problem is about how to split the payoffs between the players.

There are two kinds of cooperative games. The game may have transferable utility, or it may have non-transferable utility. Transferable utility is relevant when the players measure utility of the payoff in the same units and there is a means of exchange of utility, such as side payments.

The payoffs in the strategic form of a game can be presented in the form of a table, each cell of which represents a distinct strategy combination. Such a tabular presentation becomes possible because the players choose their strategies simultaneously. Let us consider an example.

There are two prisoners suspected of a serious crime. There is no judicial evidence for the crime except if one of the prisoners testifies against the other. The two are interrogated separately. If one of them incriminates the other, he will be let off without any punishment (amounting to a payoff of, say, 4), whereas the other prisoner will be awarded a long prison sentence (payoff 0). If both testify against each other, their punishment will be somewhat less severe (payoff 2 for each). However, if they both ‘cooperate’ with each other by not testifying at all (i.e. by choosing to remain silent), they will only be given a very short term in jail (due to lack of strong evidence), for example for possession of illegal weapons (payoff 3 for each, with a total payoff 6). This is a non-zero-sum strategic game: The total payoff is 4 if both incriminate each other. It is 4 if any one of them testifies against the other; the defecting prisoner is rewarded with immunity from prosecution, and the other gets a long term in jail. If they cooperate with each other (unknowingly, of course), both get only a very short term of imprisonment, and the total payoff is 6.

To defect or to cooperate, that is the prisoner’s dilemma (PD).

In this two-player game, let the first prisoner be called player I, and the other one is player II. And let C or c stand for cooperation, and D or d for defection: C and D are the options for player I, and c and d those for player II. The payoff table for the PD game is shown in the payoff matrix below.





Player I can choose either row C or row D. At the same time, player II chooses column c or column d. The possible strategy combinations for the two players are: (C, c), (D, d), (C, d), and (D, c). For (C, c) the payoff for each player is 3, and that for (D, d) it is 2 for each player.

If only player I defects, i.e. strategy (D, c), the payoff to player I is 4, and that to player II is 0. If only player II defects (strategy (C, d)), he gets a payoff of 4, and player I gets 0.

We note that the ‘defect’ strategy dominates over the ‘cooperate’ strategy. Consider player I. If II chooses c, the payoff for I is 4 for D and 3 for C. If II chooses d, the payoff for I is 2 for D and 0 for C. In either case, D dominates over C for player I. And the symmetry of the payoff matrix implies that for player II also, ‘defect’ dominates over ‘cooperate.’

Thus the selfishness of the independently acting players in the PD game makes defection a dominant strategy, and cooperation a dominated strategy resulting in a lower total payoff for the two players. Cooperation would have given them the best total payoff of 6.

There are many real-life versions of the PD game wherein individual defections at the expense of others lead to outcomes which are, on the whole, less profitable. Here are some examples: arms races; litigation instead of settlement; environmental pollution. Treaties or laws are introduced sometimes to enforce cooperation.

The PD game gets fundamentally changed if played more than once by the same players. In such a repeated game, patterns of cooperation emerge as rational behaviour. The fear of punishment in the future outweighs the gain from immediate defection.

I shall dwell on this some more in the next post, after discussing the Traveller's Dilemma (TD) game, crafted by Kaushik Basu. PD is a special case of TD.