AlphaStar: Maybe?

Big news in gaming this past week is that Google DeepMind turned its attention to another game and had a dominating victory, this time StarCraft II. But are we seeing the same sort of thing we did with Go and Chess? I’m not sure, and I am probably not going to look deeply enough to make a judgment, but I wanted to pass along a couple of links.

For those who missed previous DeepMind adventures: Google has a machine learning setup that it is turning loose on games with amazing results. Go was long considered a space where humans would continue to beat computers because it has a very large decision space. As it turns out, no, modern computers can trounce that, and AlphaGo is the best player in the world, getting even better when it completely ignores the human history of playing Go and starts with only the rules, playing itself millions of times. AlphaGo was so good that human commentators could not see what it was doing and were laughing at its “mistakes” as it beat the best human players alive. Oh, it turns out humans have been playing Go suboptimally for hundreds of years. And then AlphaGo got better.

StarCraft II is a similarly large next jump, a game without perfect information, in real time, with an even larger decision space. I can just point you towards the DeepMind AlphaStar post on it. AlphaStar can beat professional StarCraft II players, apparently pretty consistently. You can see more discussion from the team at their Reddit AMA.

I found this counterpoint rather compelling. The argument is that AlphaStar is winning through superhuman clicking capabilities, not superhuman strategy as was the case in Go. AlphaStar was, so the argument goes, executing strategies dependent on perfect micro, telling individual units where to move and attack. AlphaStar will always win fights between even numbers because it can perceive and act quickly enough to target individual shots, and its strategic difference is favoring units where this advantage matters more in terms of perfect placement.

I am not qualified to comment on the extent to which AlphaStar is making human-imitating spam clicks versus effective clicks. I do find it compelling to point out that AlphaStar engages in 1500+ APM at critical moments, particularly the example of having three units teleport simultaneously to three different points for an attack. Even if you were capable of perfect precision, I cannot see a human using the mouse and keyboard fast enough to do that … while also engaging in 19 other clicks in the same second. Being able to act without an interface and off camera is really big, and it is not the sort of mastery intended here.

AlphaStar is intended to demonstrate human-equivalent or -superior decision-making. If it instead teaches itself to exploit human-superior clicking, that can lead to a win but not the goal, like the machine learning story about the computer who learned to win games by making the opposing computer crash. That is not winning within the game.

In the end, though, this is probably the most important comment. Yes, there is a lot to be considered in terms of what restrictions the computer needs in terms of speed and accuracy, but ultimately this was the proof of concept. With more time and practice, we should expect AlphaStar to do amazing things within the game.

And remember, AlphaGo got better than any human had ever been by learning from humans. It got better than that by learning on its own. Twice.

: Zubon

9 thoughts on “AlphaStar: Maybe?”

  1. I can not understand the fixation with inhuman APM. This is exactly the field where computers are simply better then humans. The goal here was not to train an AI that has human restriction to be better then a human, but to train an AI that is better than humans with all the benefits it enjoys from not being a human. In that they succeeded. The AI arrived at a strategy that beat the human. If the human that was beaten would have been able to implement this strategy if it occurred to him is secondary in my opinion.

    We have to accept that. The call for limiting the APMs for the AI sounds a lot like special pleading to me to save our fragile egos. You could call for a lot of other handicaps for the AI with the same reasoning. Why does it not have to blink for instance and has limitless concentration and unfailing memory?

    1. That is to say that a dump truck is a better sumo wrestler than any human can be, because it can inexorably push anything out of the entire ring and no human can flip it. You could also say that AlphaStar is not playing the same game at all because it is not using the user interface, so it should be forced into the same hardware restrictions of using a mouse, keyboard, and monitor, along with needing visual recognition software to see what is happening on-screen.

      Except the difficulties of visual recognition and robotic mouse operation are not the problems that AlphaStar is trying to solve. And playing by being able to micromanage every individual unit on the board is not the problem AlphaStar is trying to solve.

      This is a common issue in machine learning: the computer will optimize for the goal by doing something physically impossible or practically useless by doing something that technically meets the goal as specified. The computer meets its defined objective but is not solving the intended problem.

      Similarly, Mike Tyson could be one of the greatest chess players in the world. Upon sitting down at the table, he punches his opponent in the face as hard as possible, knocking them out. His opponent then runs out of time for making moves and forfeits. It would be special pleading of chess grandmasters’ fragile egos to limit Mr. Tyson from using his arms as effectively as possible while playing.

      He similarly defeats the best computer opponents by unplugging them.

      1. I think your Myke Tyson analogy is wrong. Punching the opponent in the face is not playing chess. Issuing fast orders in SC is playing the game. If the AI had learned to trigger some bug in the game code to get a illegal advantage i would say that it would be wrong to allow that. But if it only issues order faster then humanly possible i don’t really see the problem. That’s just what computers do best.

        If on the other you are trying to level the playing field so human can compete that’s another thing. But i’m not sure if that’s the right path if we wan’t to get the best use out of the technology.

        I think that is more like slowing down the calculations of a chess computer so it’s not to strong against a human.

        1. You are arguing with something other than the point here, and I am not going to defend the position you are arguing against.

          That is not winning in terms of their own goal. Yes, the computer is faster, so just give it even more resources and it is even faster. That just becomes a hardware structure limitation: how much RAM can we throw at a problem? Their goal is not to see if they can make a virtual player with the most APM. Winning the computer game is a marker, not the goal itself. We already know computer can issue commands faster than humans. That is just RAM, not machine learning.

          The issue is not that the computer is winning a computer game. The issue is that it may not be winning in the intended fashion. AlphaStar IS winning WITHIN the game. StarCraft II is being used as a means of trying to get certain kinds of performance out of a computer. If it turns out that AlphaStar is winning just because it is very fast, we are not getting the desired kinds of performance, and millions (?) of dollars will have been spent on an answer that amounts to “give the AI more RAM.”

          No one cares about whether it is fair to human opponents.

          Try a different analogy, like a test in school. You will get a higher score if you know the questions in advance and memorize the answers. That is succeeding within the test while defeating the point of the test. No one cares about the test itself, except in what it tells us about something else.

          1. OK i thought the goal was simply to win a game of Starcraft. And Starcraft was seen as a problem you cannot win with just throwing more and more resources at it. Unlike chess the decision space is so much wider you can’t just brute force it, like chess computers did for a long time.

            If you see this exercise as some sort of optimization problem it makes sense to set some restraints to force the learning process in a direction. I’m not sure if it’s the correct direction but who knows.

            The overwhelming feeling i got when a was reading the reddit threat was that the superior APM of the AI was seen as an unfair advantage. That the AI was winning in a kind of way that was not pretty enough. That’s not a reasoning i think is helpful.

            But i’m still not sure what the point of the test is if it’s not winning the game? Is it winning the game while playing like a human would? Or is the goal to find some new strategies a human could implement?

            Perhaps the best strategies for a game like this is to micromanage everything with an insane barrage of commands. If limiting a APM lead to a better less resource intensive outcome i’m all for it. If not i think you should allow the computer to play how it plays best.

            And thanks for the discussion. I think i at least get now where you are coming from. I’ll think some more about it.

            1. My take on this is that Starcraft already has inhuman API. The game itself is a single player campaign as well as a multiplayer game. That computer opponent never uses the same actions, but that’s always been allowed. To say “it’s unfair as it does not have restricitons” negates the fact that you were able to play that way all along. Is this really different from having a “nightmare” AI difficulty? Or a civ match where the computer gets extra resources to balance for sub-optimal AI? Would it be different if Blizzard gave their okay and said they approve of it’s actions?

              I know it sounds like I’m backing up the same argument, but I find it really interesting that *this was never considered when it was designed*. No one seemed to pre-empt that the actions per minute would make that big of a difference, and I think that should be supported, not dissuaded. How MUCH of an advantage is it if each of it’s 12 troops can individually aim each shot vs my mass “everyone shoot that guy”, and I look forward to even more unforeseen discoveries if we go down that road.

            2. You could look at it as a problem because humans have nothing to learn from such wins. Humans cannot repeat 1500 APM and precision required.

              Humans could learn from AlphaGo strategy and improve their own play through that, even if they wouldn’t arrive at AlphaGo level.

              That is why humans respond differently to those wins.

              If there is nothing to learn from it, it’s not a research, it’s just a toy.

  2. I think SC is a bad game to use here because at even a decent skill level, its far more of a micro game than a macro. I’d love to see AI play against top players in something like Civilization, where not only is player decision making a factor, but there is a healthy amount of RNG you also have to contend with.

    1. I don’t think DeepMind has approached any games with a social multi-player aspect. The ability to form alliances introduces an entirely new domain to master.

Comments are closed.