Opponent modeling and exploitation in poker using evolved recurrent neural networks
As a classic example of imperfect information games, poker, in particular, Heads-Up No-Limit Texas Holdem (HUNL), has been studied extensively in recent years. A number of computer poker agents have been built with increasingly higher quality. While agents based on approximated Nash equilibrium have been successful, they lack the ability to exploit their opponents effectively. In addition, the performance of equilibrium strategies cannot be guaranteed in games with more than two players and multiple Nash equilibria. This dissertation focuses on devising an evolutionary method to discover opponent models based on recurrent neural networks. A series of computer poker agents called Adaptive System for Hold’Em (ASHE) were evolved for HUNL. ASHE models the opponent explicitly using Pattern Recognition Trees (PRTs) and LSTM estimators. The default and board-texture-based PRTs maintain statistical data on the opponent strategies at different game states. The Opponent Action Rate Estimator predicts the opponent’s moves, and the Hand Range Estimator evaluates the showdown value of ASHE’s hand. Recursive Utility Estimation is used to evaluate the expected utility/reward for each available action. Experimental results show that (1) ASHE exploits opponents with high to moderate level of exploitability more effectively than Nash-equilibrium-based agents, and (2) ASHE can defeat top-ranking equilibrium-based poker agents. Thus, the dissertation introduces an effective new method to building high-performance computer agents for poker and other imperfect information games. It also provides a promising direction for future research in imperfect information games beyond the equilibrium-based approach.