An exploration of game theory and genetic algorithms, watching reciprocity coalesce in a soup of self-interest
"Discoveries", if that's the word I'm looking for
I spent some free time in pursuit of faith: I watched genetic algorithms develop strategies for playing an iterated Prisoner's Dilemma game.
I didn't add to the world's knowledge. There's a scholarly paper by some guy named Lindgren that talks about similar stuff. That paper is more rigorous and explored many things I didn't.
On the other hand, I wouldn't have understood Lindgren's paper without doing what I did. E.g., Lindgren said, "It is well known that the introduction of spatial dimensions may increase the possibility for cooperative behaviour..." Maybe it is well-known amongst Lindgren's peers, but it was news to me.
When I started out, I didn't know about what research had already been done. I was just an amateur. Jonathan Magasin, a co-worker, knew I was interested in the Prisoner's Dilemma and pointed me at an example in a book on genetic algorithms. I stumbled along from there. I'm writing this for the amateurs, people who don't know from Lindgren. Maybe you can dodge some of my mistakes.
The Prisoner's Dilemma is a trading game for two players. Each player has something that they can trade with the other.
It's a non-zero-sum game. In this context, that means that each player values his competitor's commodity more than his own. For example, player A might have a loaf of bread; player B might have a jar of peanut butter. Bread is nice, peanut butter is nice, but peanut butter sandwiches are really good. If either player can get a bit of the other's commodity, he can get a sandwich.
Though there are two players, they aren't trying to 'beat' one another. Each wants a high score; neither cares whether their score is higher or lower then the other player's score.
I said it's a trading game, but really it's a giving game. In this game, the players have no way to enforce a trade. They have no way to communicate with each other than by giving each other stuff. Player A cannot ask Player B for peanut butter. Player A can't offer some bread in exchange for peanut butter; he can only give Player B some bread and hope for the best.
Neither player really wants to give up anything. Having a sandwich and some leftover bread is a little nicer than just having a sandwich. Thus, each player's best strategy is to not give anything, but hope the other player gives something nonetheless. Assuming both players follow the best strategy, each will be stuck eating plain bread/peanut butter. (This outcome disturbed many people, perhaps because the game was originally described as a moral dilemma of two prisoners trying to decide whether to rat each other out.)
"Iterated" means that the players play a few times in a row. Each day, Player A gets another loaf of bread; each day, Player B gets another jar of peanut butter. Each day, each player can decide whether to give the other player something. The player remembers what happened the day before, and that's when things get interesting.
In the iterated Prisoner's Dilemma game, the best strategy is something like:
This strategy is known as 'tit for tat.' If a player uses 'tit for tat' with another player using a similar strategy, both will dine on sandwiches every day. If the other player nevr gives, the 'tit-for-tat'-using player loses some bread on the first day, but never falls further behind than that.
Tit-for-tat is pretty co-operative. It's happy to give to others as long as those others give back. Those people who were using the Prisoner's Dilemma to model the arms race--they were glad that such a co-operative strategy was so successful.
First, "algorithm" as explained by the Random House College Dictionary:
al.go.rithm, noun a set of rules for solving a problem in a finite number of steps, as for finding the greatest common divisor.
When a programmer starts writing code to solve a problem, she (hopefully) has a plan in mind. This plan is an algorithm. Typically, programmers come up with algorithms with their brains.
It's often unclear which algorithm is best for solving a problem. What's the best algorithm for racking up big scores at the Prisoner's Dilemma? "Uhm, based upon the other player's past actions, figure out whether or not to give. If I give and they don't, then I should stop giving. If I don't give give but they do, then I should continue to not give." That leaves the programmer with a lot of leeway, a wide range of choices. How can we find the best strategy within this range?
Okay, "genetic," as explained by Random House:
ge.net.ic, adj ...of, pertaining to, or produced by genes; genic.
Mother Nature was faced with the problem, "What is the best way to survive on planet Earth?" She had a lot of leeway. She came up with the gene. Genes are like a set of instructions for slapping together proteins and thereby creating critters. Critters which survive will use their genes to create more critters.
Genes have some nice properties, properties which caught the eye of computer programmers:
A genetic algorithm tries to model this. First, we need to invent our own genetic code. Nature's genetic code consists of instructions for combining proteins. A genetic algorithm's genetic code will have a different set of instructions, instructions useful for solving the problem.
Next, we randomly create many gene strands. We follow its instructions, and see how well those instructions solve the problem. We discard the worst. The best survive.
We also try mutating some of the best genes. We try breeding some of them, too. We see how well these new genes solve the problem. Again, we discard the worst and keep the best.
When I say "we" do this, I really mean that the programmer writes a program that tries each of these gene strands, mutates and combines them. You might consider each gene strand to be a mini-algorithm. Each is a variation on a basic algorithm--the set of algorithms that can be described by our genetic code. Then the program which tests and combines these mini-algorithms is a kind of meta-algorithm, mixing and stirring the mini-algorithms to let the best of them rise to the surface.
I wrote a computer program to pit different Prisoner's Dilemma strategies against one another. Strings of data would represent the strategies; I'd use genetic algorithm techniques to tweak these strands, and let better strategies emerge.
I didn't write the program all at once. I built it up in stages, testing each stage. Before diving into genetics, I decided to set up the world: a population of "co-operators" (who always give) and "defectors" (who never give).
Each critter would start with a few points. A critter could "give" to another critter: this would cost the giver a point; the receiver would get a 2-point boost. If two critters gave to each other, they'd both be better off: a Prisoner's Dilemma.
I decided that my world would be a grid. Each square of the grid could hold a critter. Each critter could interact only with his neighbors. When a critter ran out of points, it would die and its square would empty. As critters got older, the program would wither away their points unto death; all critters would thus be mortal.
Every so often, the program would pick an empty square: a new critter would be born there. The program would look at the squares around the empty square: whichever neighbor had the most points would be the parent. A new critter would come into being, and it would use the parent's strategy.
A critter with a dumb strategy wouldn't gain many points from its neighbors; as it aged, it would weaken and die quickly, taking its dumb strategy with it. A critter with a smart strategy would accrue many points; it would parent many children, passing its strategy on to them.
But I didn't have any smart strategies yet. I only had co-operators and defectors. They weren't smart, but at least I could test my program, make sure that I had the grid working.
When a defector goes up against a co-operator, the defector wins. Each turn, the co-operator gives to the defector; the defector just sits and collects. The co-operator withers; the defector grows fat. When I ran this first draft of my program, I expected the defectors to wipe out the co-operators. But that's not what happened.
"the introduction of spatial dimensions may increase the possibility for cooperative behaviour"
cccccccccc cccccccccc cccccccccc cccccccccc ccccDccccc cccccccccc cccccccccc cccccccccc cccccccccc cccccccccc
cc.ccccccc cccccccccc cccccccccc ccc.DDcccc cccDDDcccc cccDDccccc cccccccccc cccccccccc cccccccccc cccccccccc
cccccccccc cccccccccc ccccDccccc cccDDDcccc cccD.Dcccc cccDDDcccc cccccccccc cccccccccc cccccc.ccc cccccccccc
cccccccccc cccccccccc cccDD.cccc cccD.Dcccc cc...D.ccc cccDDDcccc cccDc.cccc cccccccccc cccccccccc cccccccccc
cccccccccc ccc.cccccc ccD.D.cccc cccD.Dcccc cc..DD.ccc cc.D.Dcccc cccDc.cccc cccccccccc cc.ccccccc cccccccccc
cccccccccc ccc.cccccc ccD.D.cccc cccD.Dcccc cc..DD.ccc cc.D.Dcccc cccDc.cccc cccccccccc c.cccccccc cccccccccc
cccccccccc ccc.cccccc ccD.D.cccc cc.D.Dcccc cc.....ccc cc.D.Dcccc cccDc.cccc ccDccc.ccc cccccccccc cccccccccc
cccccccccc cccDcccc.c ccD.DDcccc cc...Dcccc ccc....ccc cc...Dcccc cccD.Dcccc ccDccccccc cccccccccc cccccccccc
cccccccccc cc.DDccccc ccDDDDcccc cc...Dcccc ccc....ccc cc...Dcccc cccD.Dcccc ccD.ccccc. ccDccccccc cccccccccc
ccc.cccccc ccDDDccccc cc...Dcccc ccc..Dcccc cccc...ccc cc...D.ccc cccDDDcccc ccD.c.cccc ccDccccccc ccccccc.cc
When a defector meets a co-operator, the defector wins. The co-operator dies, the defector gives birth to another defector. After this happens a few times, the original defector is pretty much surrounded by defectors. Now it's in trouble: it has killed off its own food supply. Soon the defector weakens and dies. A clump of defectors isn't stable: its middle rots out, leaving a thin ring. If part of the ring dies off, then co-operators spill into the middle.
I expected the defectors to wipe out the co-operators. Instead, the grid was mostly populated with co-operators; a few strands of defectors pulsed within. The strands could never get wide--a defector in the middle of a thick strand would starve. The defectors would never take over, as long as they could only prey upon their neighbors.
I got a genetic algorithm up and working. My critters could remember what their neighbors had done to them for the last few turns. Critters could choose to give to each other. They could also attack one another. They could also choose to breed with one another.
I threw in that last feature figuring that nice critters might not want to breed with ornery critters. Two critters could only breed if both of them wanted to breed with each other at the same time. Critters could only reproduce by breeding.
The result was a mess. The original crop of critters had random genes. This led to some strange desires. Some masochist critters would only breed in response to attacks. Some would breed only under unlikely circumstances.
The result was a bunch of strategies that went through strange rituals to encourage one another to breed. Some critters might have excelled at Prisoner's Dilemma strategy, but there were enough masochists in the world such that the best breeding strategy was often a cruel trading strategy.
I finally tried reading the introductory chapter to a book on genetic algortithms. It warned about this sort of thing. If you want to breed a critter that flies, your control program should choose breeders based on flying ability, like a eugenecist. If you let the critters decide, you may get a peacock instead of a falcon.
There's another thing I learned, something I could have learned without playing with genetic algorithms. But, nevertheless, that's when I learned it.
I simplified my critters. They could no longer attack each other. They couldn't decide who to breed with: the control program decided for them.
I was looking over the results, trying to figure out what I was looking at. The emerging strategies were kind of like tit-for-tat, but not exactly. This wasn't surprising. The way I'd coded things, it was almost impossible that a pure tit-for-tat strategy would emerge. Actually, it was almost impossible that any pure strategy would emerge. There would be noise in the signal. If I'd tried to create a mini-strategy by hand within this program, maybe I could come up with something that acted like tit-for-tat 19 times out of 20. If you figure that many of these critters are going to have some mutant gene in them somewhere--that adds a little more noise.
People have run worlds with Prisoner's Dilemma-playing critters in worlds with noise before. They set up worlds in which a critter will accidentally make a mistake: giving when it didn't want to; or not giving though it wanted to. Other worlds had critters that occasionally remembered things wrong.
The tit-for-tat strategy doesn't do very well in a noisy world. A tit-for-tat critter won't give to another critter who didn't give in the previous turn. Even if that other critter is usually a good trading partner, but had a rare brain-fart the previous turn, tit-for-tat won't give. But the key to prospering in the world of Prisoner's Dilemma is maintaining good trade relationships.
In a noisy world, a world in which critters make mistakes, mercy and forgiveness is a good survival trait. A critter who gives to another critter (even though the other critter didn't give last turn) might be able to establish (or re-establish) trade, and thus prosper.
I didn't know this.
I looked at the strategies that come out of my program. Rarely, they don't give when a tit-for-tat strategy would. Often, they give when a tit-for-tat strategy wouldn't.
I should clarify: this "merciful" approach doesn't work so well in a world populated by defectors. It works best after the world's been running a while, when a tit-for-tat-like strategy has prospered, when most critters will reciprocate to giving.
I said I was in pursuit of faith. Maybe that was a strange thing for an atheist to write.
I've snarkily said that religion is for people who think that they need an excuse to be nice to one another. I don't think it's a very nice thing to say. I'm usually pissed off when I say it, pissed off at someone who's implying that only religion allows niceness to emerge, that beasts couldn't come up with it on their own.
Many people seem to have an idea that mean critters are better at surviving than nice critters are. The iterated Prisoner's Dilemma gives the lie to that. Tit-for-tat's triumph shows that fairness can be a positive survival trait. As I found out, forgiveness can also aid survival.
The iterated Prisoner's Dilemma, without a genetic algorithm, shows that a strong, nice strategy can prosper. It doesn't show that such a strategy is likely to emerge in the real world, though. After all, the Communists have suggested a nice way to live--From each according to his ability; to each according to his needs. They still haven't figured out a way to reach this way of life without having some bully hijack their revolution.
The genetic algorithm showed me that a nice strategy can emerge from a primordial soup of random strategies. It must be as exciting as watching a primordial soup of organic chemicals yield proteins when struck by lightning.
| comment? | | home |