Deepmind Technologies, the company which created the Alphago artificial intelligence (AI) that has beat human go masters, has come up with a new AI which is even more powerful. Unlike Alphago, Alphago Zero learns in a completely new way and requires less hardware to do it.
While Alphago was trained on many go games, Alphago Zero was merely given the rules for the game and learned how to win by playing go against itself.
In a recent paper published in the journal Nature, Deepmind researchers* say that after three days of self-play training, AlphaGo Zero defeated the version of AlphaGo that had defeated 18-time world champion Lee Sedol - by 100 games to 0.
After 40 days of self training, AlphaGo Zero outperformed the version of AlphaGo known as “Master”, which has defeated the world's best players and world No. 1 Ke Jie.
In essence, AlphaGo Zero attained years of human knowledge by experimenting and remembering successful moves - only in days, not centuries. AlphaGo Zero also developed unique strategies and moves as it was not limited to learning from human games. It was able to do it more quickly than previous systems, and used less computing power - four tensor processing unitis (TPUs), which are specific chips that Google developed for machine learning. The version of AlphaGo that defeated Lee required 48 TPUs to train, though AlphaGo Master also used four TPUs.
This paves the way for AI to surpass human effort and find new solutions to existing problems, Deepmind said.
Explore:While Alphago was trained on many go games, Alphago Zero was merely given the rules for the game and learned how to win by playing go against itself.
In a recent paper published in the journal Nature, Deepmind researchers* say that after three days of self-play training, AlphaGo Zero defeated the version of AlphaGo that had defeated 18-time world champion Lee Sedol - by 100 games to 0.
After 40 days of self training, AlphaGo Zero outperformed the version of AlphaGo known as “Master”, which has defeated the world's best players and world No. 1 Ke Jie.
In essence, AlphaGo Zero attained years of human knowledge by experimenting and remembering successful moves - only in days, not centuries. AlphaGo Zero also developed unique strategies and moves as it was not limited to learning from human games. It was able to do it more quickly than previous systems, and used less computing power - four tensor processing unitis (TPUs), which are specific chips that Google developed for machine learning. The version of AlphaGo that defeated Lee required 48 TPUs to train, though AlphaGo Master also used four TPUs.
This paves the way for AI to surpass human effort and find new solutions to existing problems, Deepmind said.
![]() |
Source: Deepmind blog. AlphaGo is more power-efficient thanks to hardware gains and algorithmic evolution. |
Download the research paper (PDF)
*The work was done by David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel and Demis Hassabis.
No comments:
Post a Comment