Friday, August 17, 2012

Freeman Dyson and William Press' minirevolution in game theory

Prisoners facing a dilemma recommended not to cooperate any longer

Fred Singer (*1924) has pointed out an interesting Physics arXiv Blog's review of a new preprint by Adami and Hintze that mainly builds on the important March 2012 paper
Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent (PNAS) review by William Poundstone (with interviews)
and that, aside from more important things to be discussed momentarily, challenges the stereotypes that creative scientists and math thinkers should be below 30 or 40. Fred is almost 88 years old and you may think that he would refer to somewhat younger people's research but you would be wrong. The paper above was written by William Press (*1948) and Freeman Dyson (*1923). ;-)

Researchers in game theory are currently fixing the holes in their lore. If you want to excite the community of game theorists, the 2012 data suggest that the optimum age for you could be 89 years or 64 years. ;-) What happened?

Consider the prisoner's dilemma i.e. the following game or situation.

Notorious climate alarmists Al Gore (A) and Bill McKibben (B) are finally arrested and each of them is allowed to either snitch or remain silent. If both men remain silent, both of them only get one year for a lesser crime. If one of them snitches, he is freed but the other one gets 6 years in prison. But snitching isn't a universal recipe for freedom: if both snitch, each of them gets 3 years in the jail.

(The original problem was talking about months, not years, in the prison but given my choice of the names, the timing looked preposterous so I had to change it to years, too.)

If Al and Bill communicate in advance, it's collectively better for them to agree to remain silent: when they "cooperate" (with one another), they only get one year. In that way, they get 2 years in total while any snitching would mean that they get 6 years in total. However, if you can't communicate with your accomplice, it's better to avoid the maximum prison term by snitching – by "defecting" (effectively trying to hurt the other guy). In that way, you get 0 or 3 years, depending on your partner's decision, which is better than 6 or 1 year. So at least the arithmetic average makes it better to snitch.

But is it the relevant average? If you knew something about the other fearmonger, your decision could be different and perhaps better. You don't have to spend the same time in the jail as he will, after all. But in the one-round game, you don't really know anything about the thinking or strategy of the other guy so the solution is ambiguous and depends on the priors.

Things become more interesting and complex if Al and Bill and arrested many times because in that case, you can no longer claim that you know nothing about the other villain's behavior. What is the best strategy for Bill McKibben if he's arrested together with Al Gore many times?

Until this year, it's been a part of the widely believed game theorists' lore that the best strategy for Bill McKibben is to copy the decision of Al Gore from the investigation of the previous crime (a "tit-for-tat" or fair strategy); in this description, I am not assuming anything about Al Gore's Al Gore Rhythm. What happens if Bill copies Al in this way? They asymptotically serve the same total prison term.

Why? Use the equations \(a_i=0,1\) and \(b_i=0,1\) when the \(i\)-the decision by Al or Bill is "silent" or "snitch", respectively. The difference between the total prison terms is 3 years times the sum \(\sum_i (a_i-b_i)\) because if their responses are the same, they get the same prison term. However, \(\sum_i(a_i)-\sum_i(b_i)\) vanishes up to the first and last term if \(b_{i+1}=a_i\) – the sums are just shifted.

For example, if Al is silent all the time, Bill will also be silent and each of them will get 1 year for each crime. If Al snitches all the time, so will Bill and each of them will get 3 years for each of the many crimes. If Bill notices that Al is always silent, it would be better for Bill to snitch all the time and remain free. So why did we say he should remain silent? It's because Al would notice he's getting 6 years all the time and he would (probably) adapt and modify his strategy, perhaps, after some time. So at some moment, Al's behavior becomes "hard to decipher".

Such an answer (Bill should just delay Al's answer) may have looked natural because of the symmetry that "should" be respected in some way, people thought, and because of computer models that provided people with some "evidence" that this strategy is optimal. Well, all this evidence was just a sloppy collection of prejudices. In Spring 2012, Press and Dyson found an ingenious strategy that is better. With this strategy, the villain may decide about his former partner's overall score; or set an "extortionate" linear relationship between both men's scores.

Yes, as far as I know, it's the first paper ever published in PNAS that, when properly formulated, envisions a repeated prison term for the two notorious climate alarmists.

This is at least a minirevolution in game theory because it changes a slogan upside down. The widely believed optimum strategy used to have this slogan:
To maximize your freedom, don't be too clever, don't be too unfair. Mediocre egalitarian folks will prevail.
Press and Dyson found out that it's better to be clever – and unfair in a clever way – after all. ;-) In other words, you may fool evolution and you may be better off with unfair strategies. (Karl Sigmund and Martin Nowak pointed out that a more accurate adjective than "evolutionary" for these strategies would be "adaptive" but I will keep on using the inaccurate adjective "evolutionary" here.) Dyson apparently did the maths for the paper, Press did the game theory.

The paper was quickly followed by another PNAS paper (mostly a review, with one table of simulation results) by Alexander Stewart and Joshua Plotkin (full text here, 2-page PDF). I know Joshua as my ex-fellow Junior Fellow.

Let me say what the insight is in different words. People would believe that the best strategies would be the "evolutionary strategies" – trying what works and what doesn't and choosing the better strategies from the trials. Press and Dyson showed that such strategies may be beaten by someone who knows more sophisticated maths. Their superior strategy may be expressed by setting a particular determinant to zero; it is a zero-determinant strategy. Graphically, all these strategies lie on a plane.

So the pragmatists, empiricists, crowd-sources may seem clever enough but there can be cleverer folks who may assure the pragmatists about their inferiority. When a pragmatist is facing such a superior player, his evolutionary strategy will lead him to conclude that it's best for himself to admit his own inferiority and accept strategies that, despite their relative quality (within the evolutionary strategies), keep the pragmatist in the jail for a longer time. ;-)

William Press realized the basic "qualitative insights" that something could be totally different than believed when he was trying to do computer modeling of such things. His computer, assuming that your own strategy always affects your own score, was crashing at various points. In some parameter space, the events were located on a plane! It was good news because the crashing of the computer model actually proved that the assumption was wrong. If you play against a zero-determinant-equipped former accomplice, your total score doesn't depend on your strategy at all!

Dyson and Press have also showed that if your foe only remembers a certain number of recent rounds, there is no reason for you to have a greater memory: this can't improve your strategies.

The zero-determinant strategy itself depends on two parameters, \(\chi\) and \(\phi\). If \(\chi=1\), it becomes a "fair" strategy that leads to the same score of both players; the tit-for-tat is a special example for \(\chi=1\) and \(\phi\), a less intuitive parameter (let me call it an axion), set to the extreme value of its a priori allowed interval. Different values of \(\phi\) for \(\chi=1\) lead to other cooperative strategies that differ from tit-for-tat by subtle nuances only.

However, you may pick \(\chi\gt 1\) which means that you will have an advantage over your (new) foe. You may also be "generous" to your (new) foe and pick \(\chi\lt 1\). In that case, he will be better off. If both players are using the zero-determinant strategy, each of them may set the foe's score but not his own. In such a situation, they may even agree on "enforceable treaties".

In the interview, Press says most of the things. Dyson recommends you the Stewart-Plotkin paper. The table 1 listing different strategies is quite impressive. It shows that the score is a decreasing function of the wins (for different strategies). Computer models were wrong, cooperation loses, and defection wins. It's kind of cute that these proved yet surprising conclusions are characteristically associated with Freeman Dyson, the world's most achieved non-PhD physicist, a maverick, and a critic of the climate models. ;-)

In the new paper, Adami and Hintze study those things in some detail and ask whether the zero-determinant strategies are evolutionary stable. Their being evolutionary stable would mean that no other strategy could start to spread and overtake a large population that was using the zero-determinant strategies at the beginning.

They find out that the zero-determinant strategies are not evolutionary stable. It's pretty much because they don't perform well against each other which gives an advantage to the champions of other strategies. Also, the generic evolution of the strategies drives them away from the zero-determinant subclass. There are lots of surprisingly complex ideas – the role of camouflaging in the stability of such strategies, time scales over which the knowledge of your foe is an advantage, and so on. But this blog entry was supposed to be just an introduction.


  1. Hi Lubos,I learned about the prisoners dilemma in a book by Hofstadter who also wrote the famous Gödel, Escher, Bach.

    I am surprised that there is a new solution and less coorparation should be the key. Could it be that their are different ways around how one defines the problem?. From the way you talk about the problem I get the idea that the goal is "winning" which means less years in prison than your opponent. But actually this would be not your goal in real life. In real life the goal would be least numbers of years in prison for yourself only. For this to signal coorparation is always key because otherwise you may easily spiral into a loop of mutual retaliation. From an evolutionary point of view it is important that evolution works on the level of the single individuum and also on the level of the species or tribe. Too much individual competition and your species may suffer.

  2. What difference is there between the higgs field and the aether ? Anyone ?

  3. Well, they could inject a regulator.
    I have some other issues in that mathematicians are still injecting certain prejudices and assumptions about behavior that are not consistent with human behavior or even quantum physics (e.g. interim communication is not a required for consistent cooperation) but hey what the heck, classical modeling without complex components is surely the way the world works right?

  4. Hi! The luminiferous aether of the 19th century was supposed to be composed of some building blocks similar to atoms - in fact, famous physicists had created a working model of an eather out of wheels and gears.

    Because of this composition, the aether would choose a preferred reference frame - one in which the average velocity of its building blocks is zero. In all other reference frames, there would be an "aether wind".

    However, the Higgs field isn't created out of any localized building blocks and it therefore chooses no preferred reference frame. Even though the vacuum condensation value of the field is nonzero, this "condensate" preserves the Lorentz symmetry. It follows that the special relativity continues to hold even in the presence of the Higgs condensate - the laws are Lorentz-symmetric. For this reason, the vacuum with the Higgs condensate may still be considered "empty". After all, it's a matter of convention whether we call the vacuum expectation value h=0 or h=v.

    You may "generalize" the word aether so that it includes Lorentz-invariant entities such as the Higgs field with a vev - but that wasn't the expectation of the 19th century advocates of the concept of the aether.

  5. And another point, after reading the paper, it should be pointed out to Dyson that plays can only occur at the rate of the slowest member, so if players have finite lifespan, the max number of plays is dictated by the member with the lowest score (so they control the rate of play simply because they are the one's out of the game most of the time). So if the evolutionary player is consistently losing, then they are also consistently controlling the rate of play. So there is a feedback mechanism that places a level of control on the ZD player. So while over some number of plays the ZD player can achieve a higher score, within some time envelope, where players have finite life, then the ZD player can still be beaten since they need multiple plays to understand the evolutionary players behavior. So the players could run out of time before the ZD player ever figured out what his opponent is actually up to. Dyson even almost states this sort of dilemma in para 2 on page 4 of his own paper. Not to mention, the assumption is that the evolutionary player is actual intelligent enough to try to optimize their strategy, an it would take some number of iterations for ZD to realize that possibility.

  6. Only years for Bill and Al? Shouldn't it be decades?