AlphaGo Zero

Last year, I mentioned that DeepMind (Google) put together a computer that mastered Go, the game that was going to be one of the last games where humans could consistently beat computers because it had too many possible moves and was computationally intractable. Note the past tense. AlphaGo did not just beat the best human player(s) in the world, it did so in ways that were so beyond the human commentators that they thought it was making mistakes the whole time. But AlphaGo did lose one game, so DeepMind made a better version that stopped losing at all.

Two bonus notes:

  • This is not a supercomputer. The early versions were distributed networks, but the better versions are single machines.
  • The pace of progress is fast. 2015: an early version of AlphaGo gets the first computer win against a professional human player. 2016: the famous version of AlphaGo beats the human world champion. 2017: the latest version of AlphaGo beats that 2016 version 100:0.

AlphaGo Zero is that latest version. It is not yet perfect, as there was a generation between what I was describing as the 2016 and 2017 versions, which can win against AlphaGo Zero 11% of the time. So AlphaGo Zero is better, but “vastly better than the previous best computer” is only so much of an improvement when that in-between version was literally better than any human. There is only so much better one can get at Go. You can only win so hard.

The really impressive thing about AlphaGo Zero is that no one taught it. The previous versions were taught by watching master players. AlphaGo Zero was given the rules of Go, left to play 5 million games against itself over the course of 3 days, and was then better than the best human in the world, beating the version that beat the human world champion. It took another 18 days (+30 million games?) to reach the level of the previous best that DeepMind had made with human guidance, then 19 more days to exceed all the old versions. I presume it is even better now, if they left the computer running. That was earlier this month, so short time frames kind of matter here.

Let’s say that again: tutoring previous generations of AlphaGo by showing it the best human games and players in history made it worse. AlphaGo Zero started from first principles on its own. It took 3 days to surpass every human and 40 days to surpass every previous computer. That is one computer with four TPUs. That is an amazing demonstration of the speed and power of machine learning. Here is AI researcher Eliezer Yudkowsky on the same topic.

: Zubon