Sample efficient multiagent learning in the presence of Markovian agents
dc.contributor.advisor | Stone, Peter, 1971- | en |
dc.contributor.committeeMember | Mooney, Raymond | en |
dc.contributor.committeeMember | Plaxton, Greg | en |
dc.contributor.committeeMember | Ravikumar, Pradeep | en |
dc.contributor.committeeMember | Bowling, Michael | en |
dc.creator | Chakraborty, Doran | en |
dc.date.accessioned | 2013-02-14T20:51:50Z | en |
dc.date.issued | 2012-12 | en |
dc.date.submitted | December 2012 | en |
dc.date.updated | 2013-02-14T20:51:50Z | en |
dc.description | text | en |
dc.description.abstract | The problem of multiagent learning (or MAL) is concerned with the study of how agents can learn and adapt in the presence of other agents that are simultaneously adapting. The problem is often studied in the stylized settings provided by repeated matrix games. The goal of this thesis is to develop MAL algorithms for such a setting that achieve a new set of objectives which have not been previously achieved. The thesis makes three main contributions. The first main contribution proposes a novel MAL algorithm, called Convergence with Model Learning and Safety (or CMLeS), that is the first to achieve the following three objectives: (1) converges to following a Nash equilibrium joint-policy in self-play; (2) achieves close to the best response when interacting with a set of memory-bounded agents whose memory size is upper bounded by a known value; and (3) ensures an individual return that is very close to its security value when interacting with any other set of agents. The second main contribution proposes another novel MAL algorithm that models a significantly more complex class of agent behavior called Markovian agents, that subsumes the class of memory-bounded agents. Called Joint Optimization against Markovian Agents (or Joma), it achieves the following two objectives: (1) achieves a joint-return very close to the social welfare maximizing joint-return when interacting with Markovian agents; (2) ensures an individual return that is very close to its security value when interacting with any other set of agents. Finally, the third main contribution shows how a key subroutine of Joma can be extended to solve a broader class of problems pertaining to Reinforcement Learning, called ``Structure Learning in factored state MDPs". All of the algorithms presented in this thesis are well backed with rigorous theoretical analysis, including an analysis on sample complexity wherever applicable, as well as representative empirical tests. | en |
dc.description.department | Computer Science | |
dc.format.mimetype | application/pdf | en |
dc.identifier.uri | http://hdl.handle.net/2152/19459 | en |
dc.language.iso | en_US | en |
dc.subject | Artificial intelligence | en |
dc.subject | Multiagent learning | en |
dc.title | Sample efficient multiagent learning in the presence of Markovian agents | en |
thesis.degree.department | Computer Sciences | en |
thesis.degree.discipline | Computer Science | en |
thesis.degree.grantor | The University of Texas at Austin | en |
thesis.degree.level | Doctoral | en |
thesis.degree.name | Doctor of Philosophy | en |