[The following is a guest article written by Ben Kloester.]
The Go world was shocked and intrigued in January, when news broke of DeepMind AlphaGo’s victory over top European pro Fan Hui 2p.
News of Fan’s defeat was met with awe, and in some cases even fear.
Since the publication of DeepMind’s paper [PDF] in Nature, and the release of the game records, professionals around the globe have had time to analyse AlphaGo’s play in more detail, and have drawn less sensational conclusions.
A consensus has emerged that although this is a great advance in computer Go ability, DeepMind would not be celebrating victory if it had been a top professional sitting across the Go board back in October.
Expert commentaries of Fan Hui vs AlphaGo, including Younggil’s SGF commentary and video commentary, Myungwan Kim’s epic video analysis, and even commentary from match referee Toby Manning [PDF] of the British Go Association (based partly on Fan’s comments), have all identified mistakes made by AlphaGo that a stronger player would have capitalized on.
Professionals have also pointed out some areas of perceived weakness for AlphaGo, and speculated on its potential limitations.
And let’s not forget that Fan actually won two of the ten matches played.
All this suggests the AI’s strength, as seen in October, is well below the top-100 ranked professionals.
Predictions Marching to different tunes
Many expect the Korean professional to demonstrate the ongoing superiority of man over machine.
But are they right? Or does AlphaGo have more of a chance than they think?
DeepMind certainly seems to be more confident than the consensus predictions – at a Korean press conference Demis Hassabis suggested it would be “too close to call” – fifty-fifty odds.
David Silver’s comment, that he would be “very disappointed” if they lost, suggests even stronger confidence.
DeepMind counts among its numbers some strong amateur Go players, as well as pioneers in Go AI, so we’d expect them to be reasonable judges.
What do they know that the pros don’t?
Forget dog years, consider AI years!
Well one thing they, and no-one else, knows is what they’ve been doing since October. And 6 months is a looong time when you are talking about machine learning!
As Hassabis mentioned at the press conference, AlphaGo has already done the equivalent of over a thousand years of human playing!
The ability to rapidly play games and learn from them means that AlphaGo, like our furry friends, passes the years faster than we do.
But instead of the 7-to-1 of dog years, we must consider ‘AI years’, which might allow thousands of years of learning per human year!
Though a striking image, simply having AlphaGo play itself for 6 months is an oversimplification of what DeepMind are actually likely to do to strengthen the program.
There are several potential improvements that might give AlphaGo extra strength, from most to least likely to be carried out:
- Improving the accuracy of the value function
- Tweaking model parameters
- Adding more hand-crafted features to the rollout policy
- Retraining the supervised learning network on professional data.
But before we delve into this in any more detail, it’s important to understand the source of AlphaGo’s strength that distinguishes it from other computer AIs.
Questioning your values
One of the things that separates professionals from strong amateurs is their ability to look at even a complex board position and tell who is ahead.
This question of ‘value’ of a board position has been a non-trivial problem in computer Go since inception, and DeepMind’s solution to it is the main thing separating its program from other Go AIs.
Deterministic, zero-sum games (like Go) actually have an objective value function across all board positions, but Go has too many combinations to ever calculate this precise value.
AlphaGo uses a neural network model to approximate the value function, and this model was created in three steps, building two other models along the way:
A ‘policy network’ (i.e. a model giving a probability distribution over possible moves) built using ‘supervised learning’ (SL – where we get the model to make a prediction, then we give it the answer and it adjusts the model to ‘learn’ from the answer) to predict a human’s move, given a board position.
AlphaGo’s supervised learning policy network successfully predicted human moves 57% of the time, when trained on 160,000 6–9dan KGS games, with a total of 30 million board positions.
Another policy network, built by ‘reinforcement learning’ (RL) – taking the supervised learning network and getting it to play subsequent versions of itself and learn from the game outcomes, to predict the move most likely to result in a victory.
The reinforced learning policy played 1.28 million games against different versions of itself, resulting in a very strong policy network for selecting moves.
Finally, the ‘value network’, which was built by supervised learning & regression over board positions and values generated from the SL and RL networks, and predicts the expected value (i.e. probability of a victory) of a board position.
To do this, AlphaGo generated 30 million games, playing the first n-1 moves with the SL network, then selecting a random legal move, and then using the RL network to select all moves until the game ends and a value (i.e. win/lose) is known.
The value network was then trained on just one board position from each game – the one subsequent to the first RL network move – to minimize the error in predicted value.
This complex process resulted in a value function that is closer to the ‘real’ value function for Go than anyone has ever achieved before.
In fact using the value network alone, AlphaGo beat all other computer AIs!
DeepMinding the gap
As a result of this, the value function is one area where DeepMind might be able to easily gain extra strength to close the gap with Lee Sedol.
By doing more reinforcement learning to build a better move policy, and then generating a much larger corpus of games and board positions and using them to retrain the value function, they could further improve its accuracy.
All that is really required to do this is time and computing power.
Split the difference
Another area where I’d expect DeepMind to at least experiment and perhaps obtain modest improvement, is in some of the structural aspects of their model.
AlphaGo uses a hybrid approach that combines the more traditional Monte Carlo Tree Search technique of semi-random playouts with the above-described value function to assess moves.
At the moment those two techniques are given equal weight, but the optimum balance may differ from that.
There are also several other modelling constants where trial-and-error tweaking is inexpensive and could improve results.
Better than (better than?) random
Like other Monte Carlo based AIs, AlphaGo builds up a set of complete games from the current board state by playing lots of fast random(ish) games all the way to the end to see who wins.
But it is only randomish, because they use a ‘rollout policy’ to select which moves it’s more likely to explore.
AlphaGo’s rollout policy is built in a similar way to the SL policy, but is designed to be much (~1000x) faster.
Though not as likely as the previous two options, DeepMind might try adding other such features, or tweaking the existing rollout policy to improve the Monte Carlo search.
Learn from the best
One other possibility suggested by several observers is that DeepMind could use kifu from professional games to get an edge.
This would involve going back to the start and retraining their supervised learning network, but instead of the KGS games, using pro games.
There are at least 80,000 pro games out there, around half the volume of KGS games used to train AlphaGo’s SL network.
However this approach seems unlikely, not only because it requires re-doing a lot of work, but also because if there were no obstacles, DeepMind would have used it to begin with.
Whether due to copyright uncertainties or some other reason, they will probably stick with the KGS dataset they’re already using, or add data from strong players on other Go servers (like Tygem).
What forest? All I see is trees
All of that may not be enough to beat Lee if professionals’ observations about AlphaGo’s weaknesses reflect inherent limitations in its approach or structure, rather than simply not yet having learnt from enough board positions.
The critiques all touch on similar themes, which centre around a lack of whole-board awareness, or high-level play.
Among the weaknesses suggested are a lack of understanding of sente, no useful conception of aji, and a lack of ‘creativity’, or following common patterns (albeit patterns often used by strong amateurs or even pros) where the specific context calls for deviation.
Myungwan Kim’s comment about “5 dan mistakes” seems particularly prescient. It may make sense that AlphaGo struggles with whole-board interconnectedness given the structure of its underlying models.
Convolutional neural networks (CNNs) are typically local by nature, and don’t build a good understanding of the whole board.
According to Rémi Coulom, AlphaGo’s architecture uses 1 layer of 5×5 convolution, and 11 layers of 3×3 convolution, meaning “it can propagate information at a distance of 13 points, but not more”.
“So if a large dragon has one eye each, on opposite sides of the board, the neural network is completely blind to figure out whether it is dead or alive.”
“Maybe it will suppose that any such large dragon must be alive, which works well in practice most of the time.”
Though AlphaGo is remarkable when compared to all previous Go-playing CNNs, this may still be a limitation for certain positions with important non-local interactions.
A known unknown
Where does this leave us, in terms of predicting who will win next week? Sadly, there’s just no reliable way to know.
While it is clear that October’s AlphaGo would be most unlikely to win, it is also pretty clear that despite speculation on some limitations there is no obvious upper bound on how much AlphaGo will improve by then.
Demis Hassabis and the DeepMind team have expressed quiet confidence.
Unfortunately, when we contacted Demis for this article, he was unable at this stage to answer further questions “until well after the match”.
Perhaps we’re best off heeding the words of someone who knows more than most about AlphaGo’s preparations, and sticking to their prediction.
After all, who can think of a more exciting match than one that’s too close to call?
Full details for the match and ongoing reports can be found on the DeepMind AlphaGo vs Lee Sedol page.
Who do you think will win?
Who do you think will win the upcoming match between Lee Sedol and AlphaGo?
Can Lee Sedol see off this challenge to the superiority of humans in Go?
Or do you think DeepMind is about to pull another rabbit out of a hat?
Share your prediction by leaving a comment below.