The game was played on October 9, 2015, but the startling performance of AlphaGo wasn’t revealed until a paper detailing the feat was published in the science journal Nature, on January 27, 2016.
This means we have only recently heard about the games and had a chance to analyze them.
Commented game record
Fan Hui vs DeepMind AlphaGo
Go has long been a significant challenge to artificial intelligence (AI) researchers, because the large number of possible Go games make it infeasible for computers to perform well using brute force alone.
This has meant that the best human players have until now remained out of reach of the best computer players, despite decades of research into AI and advances in computing power.
Fan Hui is a professional with the Chinese Go Association and has been living in France, where he has taught and promoted Go since the early 2000s. He was born in 1981 and became a pro in 1996.
Google DeepMind contacted Fan to arrange the match and he played against AlphaGo in London, under the supervision of Toby Manning from the British Go Association.
Ten games were played in total; five official games and five unofficial games. Fan chose a time limit of 1 hour main time and 3 x 30 seconds byo-yomi each for the official games. He won two unofficial games against AlphaGo (30 seconds per move), but lost all the official games.
The first game of the match was quite leisurely and territorial. After Fan lost that game by 2.5 points, he thought that perhaps AlphaGo didn’t like to fight, so he played more aggressively in the games that followed. Unfortunately for Fan this game plan didn’t pay off.
AlphaGo is a Go AI developed by DeepMind — a British AI research company which was acquired by Google in 2014. They are undertaking a self-described “Apollo Program for AI,” in a project involving more than 100 scientists.
In this context a ‘neural network’ is a technology for processing information and forming connections in a way that is modeled on the neural connections in the human brain.
The goal of this technology is to enable computers to learn in a way that is more general and human like.
DeepMind aims to develop a general learning algorithm which can be applied to many problems instead of a pre-programmed AI which is only capable of doing one thing (e.g. playing Go or chess).
The chess computer Deep Blue, which defeated chess grandmaster Garry Kasparov in 1997, is an example of the latter (pre-programmed) AI.
It appears that AlphaGo, being a stepping stone along this path, is currently a little bit of a hybrid of the two approaches. The more general purpose neural network has been ‘trained’ by giving it access to a huge number of Go games between skilled humans. The ‘knowledge’ it has acquired throughout this process has been reinforced by allowing it to play an enormous number of games against itself and evaluate them using some serious hardware.
However, its strength is further boosted by the use of Monte Carlo Tree Search (MCTS) — a technology which has already been applied to Go for about a decade and has led to computer Go programs making great strides against amateur level players.
MCTS applies a statistical approach to finding good moves. It is a search algorithm where the computer simulates many possible games and, after seeing the result of each random game, aggregates all the results to calculate a probability of success for a selection of next moves. If this sounds counter-intuitive, that’s because it is!
MCTS does not require a great deal of domain specific knowledge (knowledge of Go provided by a human creator) to perform well, but a programmer still has to configure and tune this approach for the game in question. One of the problems AI researchers have faced with Go is that it’s difficult to evaluate whether a position is good or bad.
For example, you can’t assign scores to pieces like you can with chess, because the pieces all look the same. MCTS has, until now, evaluated positions by simulating all the way to the end of the game, counting the score, and then aggregating the results of many simulations.
Putting it all together
AlphaGo changes the way MCTS is applied by using a neural network to evaluate whether a position is good or bad. DeepMind has actually trained two neural networks as part of AlphaGo. The first, called the ‘policy network’, chooses promising looking moves for deeper analysis — similar to what humans do when they rely on instinct.
The second, called the ‘value network’, specializes in positional judgment. The value network allows AlphaGo to evaluate a position without playing each simulation all the way to the end of the game. This makes MCTS work more efficiently than it did in the previous generation of Go AIs.
The above is a relatively basic explanation of how AlphaGo works and may contain errors (though they will be happily corrected). For more detailed information about computer Go, please see:
- DeepMind’s paper explaining how AlphaGo works
- An interview with computer Go expert Martin Müller, discussing MCTS
- The computer Go mailing list
Follow the match between Lee Sedol and AlphaGo
Having defeated Fan Hui, AlphaGo has challenged Lee Sedol 9p to a match in March 2016.
Match details and frequent updates will be posted on the DeepMind AlphaGo vs Lee Sedol page.
If you would like to follow the match, you can click here to subscribe to our newsletter and receive free, weekly updates.