Friday, November 04, 2016

How Darwin plays StarCraft

StarCraft is perhaps the single hardest game for computers to play well. At least if you only count games that people care about; you could of course construct games that were harder, but there's no guarantee anyone would play those games. When doing AI research, working on games that people care about means you are working on relevant problems. This is because games are designed to challenge the human brain and successful games are typically good at this. StarCraft (and its successor StarCraft 2) are played and loved by millions of people all over the world, with a very active competition scene where pro players are well-paid stars.

And there's no question that the game is hard; there is a series of AI StarCraft competitions that has been running since 2010, but the best AI players are still at the level of human novices. In other words, roughly where the best AI Go players were 15 years ago, or the best AI Chess players were 50 years ago. As computers are now able to play Chess and Go better than the best humans, the question is when we can surpass human ability for StarCraft as well.

It's not just me thinking this. Google DeepMind recently announced that StarCraft 2 will be one of their major new testbeds, after their success at training deep networks to play Atari games in the ALE framework. Facebook AI Research recently published their first paper on using machine learning to learn to play StarCraft and just today submitted another, showing that they take this challenge seriously. In academia, there is already a rich body of work on algorithms for playing (parts of) StarCraft, or generating maps for it. Given the game's complexity, it is unlikely we will conquer all of it soon; we have our work cut out for us.

A screenshot from the original StarCraft game

One of the reasons the game is so hard is that playing it well requires thinking and acting on different levels of abstraction. The game requires resource collection management, build order scheduling, prioritizing technology development, exploration, micro-management of troops as well as overall strategy and ways of deducing and countering the adversary's strategy. Trying to build an AI that can do all this well is very very hard. It is therefore prudent to approach the various parts of the problem separately.

In a new paper, we propose a new algorithm for playing StarCraft micro, given a forward model. "Micro" is the second-to-second, sometimes frame-to-frame, business of managing armies of StarCraft units in combat. The difficulty of playing micro is the reason professional (human) StarCraft players often average several hundred mouse-clicks per minute. To an unprepared onlooker good micro play tends to look chaotic, while it is in reality a highly complex affair with certain maneuvers requiring extreme skill.

A StarCraft battle with no micro tactics.
The green troops to the left don't move at all, and lose the battle.

The same battle with active micro tactics.
By moving around units depending on their health and cooldown level, a much better results is achieved.

So, what AI methods can we use to play StarCraft micro? There have been a number of attempts to use various forms of tree search including Monte Carlo Tree Search (MCTS), a core component in AlphaGo, the software that recently beat Lee Sedol to become world champion at the game of Go. The problem with using tree search to play StarCraft is the extremely high branching factor, meaning the extremely high number of possible actions that could be taken at any time. Where Chess has an average branching factor of around 35, and Go has an average branching factor of about 300, StarCraft micro often reaches branching factors of millions. This is because you don't just move one piece, you often have 10 to 50 different units to control at the same time. And the number of possible actions increases exponentially with the number of units that can act at the same time. Complex indeed.

Standard tree search algorithms, including MCTS, perform very badly when faced with such enormous numbers of actions to choose from. Basically, there are so many actions to consider that they run out of time before even considering a single step forward. So we need to approach this problem some other way. In some work presented earlier this year, and which concerned another strategy game, we attempted to use evolutionary computation instead of tree search to play the game. This worked very well - I wrote a separate blog post about that work.

Portfolio Online Evolution (2 scripts) in the JarCraft Simulator versus script-based UCT

The basic idea is to run an evolutionary algorithm every time step to select what to do next. Each "chromosome" (or "solution" or "individual") is a set of actions - one or more actions for each unit in the game. All the chromosomes are then scored based on how good results they achieve in simulation, and then the good chrosomes are kept, the less good ones thrown away and replaced with mutated copies of the good ones, again and again. Essentially Darwinian evolution in a computer program. Well, actually it's a bit more complicated, but that's the gist of it. We call this method Online Evolution, because it uses evolution, but not to tune a controller ("offline") as is often done, instead evolution is used as an action selection mechanism ("online").

For StarCraft we wanted to combine this very effective method with a way of incorporating domain knowledge about StarCraft playing. Fortunately, Dave Churchill at Memorial University of Newfoundland had already come up with a clever idea here, in the form of Portfolio Greedy Search. The core idea here is to not select directly among the different actions (for example move to a particular place, or attack a particular enemy). Instead, his algorithm uses a number of existing "scripts", which are simple rules for what units should do in different situations. Churchill's method uses a simple greedy search algorithm to search for what script to use to control each unit each time step.

Which finally brings us to the new algorithm we introduce in our paper: Portfolio Online Evolution. As the name suggests, this is a combination of the ideas of Online Evolution and Portfolio Greedy Search. You might already have figured this out by now, but what it does is to evolve plans for what script each unit should use each time step. Each chromosome contains a sequence of scripts for each unit, and they are evaluated by simulating a number of steps forward in simulation and seeing the results of using this sequence of scripts. (Quite simply, we use the difference in total hitpoints between our team and the adversary as the score.)

Portfolio Online Evolution (6 scripts) in the JarCraft Simulator versus script-based UCT

So does Portfolio Online Evolution work in StarCraft? Hell yes! It kicks ass. Protoss ass. We tested the algorithm using the JarCraft simulator, which is very convenient to work in as the real StarCraft game itself lacks a forward model. JarCraft comes with a several tree search methods implemented. It turns out Portfolio Online Evolution beats all of them soundly. What's more, its margin of victory gets bigger the larger the battle size (number of units on each side), and the number of scripts supplied to the algorithm. We were of course very happy with this result.

So where did this leave us? We can't play a full StarCraft yet, can we? No, because the full StarCraft game does not have a forward model, meaning that it cannot be simulated much faster than real-time. Portfolio Online Evolution, like other methods that search the space of future game states, require a fast forward model. It seems that we will have to concentrate on creating methods for learning such forward models in games such as StarCraft, to allow effective AI methods to be used.

In the nearer term, one of our ongoing projects is to learn the scripts themselves, to expand the repertoire of scripts available for the evolutionary script selection mechanism to choose from.

Finally, a note on who did what: When I say "we", I mean mostly a team of students led by Che "Watcher" Wang from NYU Shanghai. The other participants were Pan Chen and Yuanda Li, and the work was supervised by Christoffer Holmgård and myself. The project started as a course project in my course on AI for Games, and Watcher then wrote most of the paper. The paper was presented at the AIIDE conference a few weeks ago.

No comments: