Last year, Google announced that it’s AlphaGo software (part of their DeepMind project) had beaten the reigning three-time European Go champion Fan Hui winning five consecutive games – the work was published in the scientific journal Nature.1 The original tree search in AlphaGo evaluated positions and selected moves using neural networks. These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from self-play. Today the team reported2 a new version of AlphaGo, “AlphaGo Zero“, that was developed without any input human data beyond game rules. Previous versions of AlphaGo were initially trained on thousands of human amateur and professional games to learn how to play Go. AlphaGo Zero skipped this step and learnt to play simply by playing games against itself, starting from completely random positions. AlphaGo Zero thus became its own teacher and after three days beat the previously published, champion-defeating version of AlphaGo by one hundred games to nil. After playing thirty million games over a period of forty days AlphaGo Zero became even stronger, outperforming the version of AlphaGo known as “Master”, which has defeated some of the world’s best players including the current world number one Ke Jie.
|– Image from deepmind.com|