Big news in gaming this past week is that Google DeepMind turned its attention to another game and had a dominating victory, this time StarCraft II. But are we seeing the same sort of thing we did with Go and Chess? I’m not sure, and I am probably not going to look deeply enough to make a judgment, but I wanted to pass along a couple of links.
For those who missed previous DeepMind adventures: Google has a machine learning setup that it is turning loose on games with amazing results. Go was long considered a space where humans would continue to beat computers because it has a very large decision space. As it turns out, no, modern computers can trounce that, and AlphaGo is the best player in the world, getting even better when it completely ignores the human history of playing Go and starts with only the rules, playing itself millions of times. AlphaGo was so good that human commentators could not see what it was doing and were laughing at its “mistakes” as it beat the best human players alive. Oh, it turns out humans have been playing Go suboptimally for hundreds of years. And then AlphaGo got better.
StarCraft II is a similarly large next jump, a game without perfect information, in real time, with an even larger decision space. I can just point you towards the DeepMind AlphaStar post on it. AlphaStar can beat professional StarCraft II players, apparently pretty consistently. You can see more discussion from the team at their Reddit AMA.
I found this counterpoint rather compelling. The argument is that AlphaStar is winning through superhuman clicking capabilities, not superhuman strategy as was the case in Go. AlphaStar was, so the argument goes, executing strategies dependent on perfect micro, telling individual units where to move and attack. AlphaStar will always win fights between even numbers because it can perceive and act quickly enough to target individual shots, and its strategic difference is favoring units where this advantage matters more in terms of perfect placement.
I am not qualified to comment on the extent to which AlphaStar is making human-imitating spam clicks versus effective clicks. I do find it compelling to point out that AlphaStar engages in 1500+ APM at critical moments, particularly the example of having three units teleport simultaneously to three different points for an attack. Even if you were capable of perfect precision, I cannot see a human using the mouse and keyboard fast enough to do that … while also engaging in 19 other clicks in the same second. Being able to act without an interface and off camera is really big, and it is not the sort of mastery intended here.
AlphaStar is intended to demonstrate human-equivalent or -superior decision-making. If it instead teaches itself to exploit human-superior clicking, that can lead to a win but not the goal, like the machine learning story about the computer who learned to win games by making the opposing computer crash. That is not winning within the game.
In the end, though, this is probably the most important comment. Yes, there is a lot to be considered in terms of what restrictions the computer needs in terms of speed and accuracy, but ultimately this was the proof of concept. With more time and practice, we should expect AlphaStar to do amazing things within the game.
And remember, AlphaGo got better than any human had ever been by learning from humans. It got better than that by learning on its own. Twice.
: Zubon