Adaptive representations for reinforcement learning
MetadataShow full item record
In reinforcement learning, an autonomous agent seeks an effective control policy for tackling a sequential decision task. Unlike in supervised learning, the agent never sees examples of correct or incorrect behavior but receives only a reward signal as feedback. One limitation of current methods is that they typically require a human to manually design a representation for the solution (e.g. the internal structure of a neural network). Since poor design choices can lead to grossly suboptimal policies, agents that automatically adapt their own representations have the potential to dramatically improve performance. This thesis introduces two novel approaches for automatically discovering high-performing representations. The first approach synthesizes temporal difference methods, the traditional approach to reinforcement learning, with evolutionary methods, which can learn representations for a broad class of optimization problems. This synthesis is accomplished via 1) on-line evolutionary computation, which customizes evolutionary methods to the on-line nature of most reinforcement learning problems, and 2) evolutionary function approximation, which evolves representations for the value function approximators that are critical to the temporal difference approach. The second approach, called adaptive tile coding, automatically learns representations based on tile codings, which form piecewise-constant approximations of value functions. It begins with coarse representations and gradually refines them during learning, analyzing the current policy and value function to deduce the best refinements. This thesis also introduces a novel method for devising input representations. In particular, it presents a way to find a minimal set of features sufficient to describe the agent’s current state, a challenge known as the feature selection problem. The technique, called Feature Selective NEAT is an extension to NEAT, a method for evolving neural networks used throughout this thesis. While NEAT evolves both the topology and weights of a neural network, FS-NEAT goes one step further by learning the network’s inputs too. Using evolution, it automatically and simultaneously determines the network’s inputs, topology, and weights. In addition to introducing these new methods, this thesis presents extensive empirical results in multiple domains demonstrating that these techniques can substantially improve performance over methods with manual representations.