Browsing by Subject "Multiagent learning"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Cooperation and communication in multiagent deep reinforcement learning(2016-12) Hausknecht, Matthew John; Stone, Peter, 1971-; Ballard, Dana; Mooney, Ray; Miikkulainen, Risto; Singh, SatinderReinforcement learning is the area of machine learning concerned with learning which actions to execute in an unknown environment in order to maximize cumulative reward. As agents begin to perform tasks of genuine interest to humans, they will be faced with environments too complex for humans to predetermine the correct actions using hand-designed solutions. Instead, capable learning agents will be necessary to tackle complex real-world domains. However, traditional reinforcement learning algorithms have difficulty with domains featuring 1) high-dimensional continuous state spaces, for example pixels from a camera image, 2) high-dimensional parameterized-continuous action spaces, 3) partial observability, and 4) multiple independent learning agents. We hypothesize that deep neural networks hold the key to scaling reinforcement learning towards complex tasks. This thesis seeks to answer the following two-part question: 1) How can the power of Deep Neural Networks be leveraged to extend Reinforcement Learning to complex environments featuring partial observability, high-dimensional parameterized-continuous state and action spaces, and sparse rewards? 2) How can multiple Deep Reinforcement Learning agents learn to cooperate in a multiagent setting? To address the first part of this question, this thesis explores the idea of using recurrent neural networks to combat partial observability experienced by agents in the domain of Atari 2600 video games. Next, we design a deep reinforcement learning agent capable of discovering effective policies for the parameterized-continuous action space found in the Half Field Offense simulated soccer domain. To address the second part of this question, this thesis investigates architectures and algorithms suited for cooperative multiagent learning. We demonstrate that sharing parameters and memories between deep reinforcement learning agents fosters policy similarity, which can result in cooperative behavior. Additionally, we hypothesize that communication can further aid cooperation, and we present the Grounded Semantic Network (GSN), which learns a communication protocol grounded in the observation space and reward function of the task. In general, we find that the GSN is effective on domains featuring partial observability and asymmetric information. All in all, this thesis demonstrates that reinforcement learning combined with deep neural network function approximation can produce algorithms capable of discovering effective policies for domains with partial observability, parameterized-continuous actions spaces, and sparse rewards. Additionally, we demonstrate that single agent deep reinforcement learning algorithms can be naturally extended towards cooperative multiagent tasks featuring learned communication. These results represent a non-trivial step towards extending agent-based AI towards complex environments.Item Sample efficient multiagent learning in the presence of Markovian agents(2012-12) Chakraborty, Doran; Stone, Peter, 1971-; Mooney, Raymond; Plaxton, Greg; Ravikumar, Pradeep; Bowling, MichaelThe problem of multiagent learning (or MAL) is concerned with the study of how agents can learn and adapt in the presence of other agents that are simultaneously adapting. The problem is often studied in the stylized settings provided by repeated matrix games. The goal of this thesis is to develop MAL algorithms for such a setting that achieve a new set of objectives which have not been previously achieved. The thesis makes three main contributions. The first main contribution proposes a novel MAL algorithm, called Convergence with Model Learning and Safety (or CMLeS), that is the first to achieve the following three objectives: (1) converges to following a Nash equilibrium joint-policy in self-play; (2) achieves close to the best response when interacting with a set of memory-bounded agents whose memory size is upper bounded by a known value; and (3) ensures an individual return that is very close to its security value when interacting with any other set of agents. The second main contribution proposes another novel MAL algorithm that models a significantly more complex class of agent behavior called Markovian agents, that subsumes the class of memory-bounded agents. Called Joint Optimization against Markovian Agents (or Joma), it achieves the following two objectives: (1) achieves a joint-return very close to the social welfare maximizing joint-return when interacting with Markovian agents; (2) ensures an individual return that is very close to its security value when interacting with any other set of agents. Finally, the third main contribution shows how a key subroutine of Joma can be extended to solve a broader class of problems pertaining to Reinforcement Learning, called ``Structure Learning in factored state MDPs". All of the algorithms presented in this thesis are well backed with rigorous theoretical analysis, including an analysis on sample complexity wherever applicable, as well as representative empirical tests.