Luca Pasqualini (University of Siena)
Feb 12, 2020 – 11:00 AM
DIISM, Artificial Intelligence laboratory (room 201), Siena SI
AlphaGo (AG) and all of its derivatives can play with superhuman strenght because they are able to predict the win-loss outcome with great accuracy.
However, Go as a game is decided by a final score, and in final positions AG plays suboptimal moves: this is not surprising, since AG is completely unaware of the final score, all winning final positions being equivalent from the winrate perspective.
This can be an issue, for instance when trying to learn the “best” move or to play with an initial handicap. Moreover, there is the theoretical quest of the “perfect game”.
This leads to the natural question: is it possible to train a successful DRL agent to predict scores instead of winrates? No empirical or theoretical evidence can be found in the literature to support the folklore statement that “this doesn’t work”.
In order to find an evidence to this statement, or to disprove it, an AG-like software, “Leela Zero Score”, is presented. This software is built to predict scores instead of winrates using an open source solution known as Leela Zero.