Browsing by Subject "Sequential decision making"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Item Learning for autonomy in the wild : theory, algorithms, and practice(2023-08) Djeumou, Franck; Topcu, Ufuk; Chinchali, Sandeep; Fridovich-Keil, David; Zhang, Amy; Putot, Sylvie; Lennon, CraigHow can autonomous systems learn to operate in the wild, i.e., complex, dynamic, and uncertain real-world environments? Despite recent and significant breakthroughs in artificial intelligence, there is still a tremendous gap between its current capabilities and what we need to do to develop systems that can autonomously operate in the wild. We aim to bridge this gap by addressing a few key challenges of learning in the wild. These challenges include learning with extremely scarce amounts of data, learning safely from a single and ongoing trial, learning to generalize to unseen situations, and learning with uncertainty-aware and explainability considerations for trustworthy human-robot interactions. We take an opinionated approach to address these challenges and argue that data are never the only source of knowledge available during training, and modern learning techniques should not treat them as such. Instead, we demonstrate that merging modern learning techniques' efficiency at extracting patterns from data with existing knowledge on how the world works is key for autonomous systems to achieve learning in the wild. This existing knowledge on how the world works may stem from structural knowledge such as fundamental principles of physics, qualitative expert knowledge such as design or mechanical constraints, or contextual knowledge such as formal specifications on the underlying task. Thus, by leveraging prior knowledge into learning through formal techniques, we propose data-driven modeling and control approaches that enable autonomous systems to operate even under severely limited amounts of data, such as streaming data from a single and ongoing trial. We additionally demonstrate that the data-driven approaches generalize beyond the training regime, improve explainability over traditional black-box models, and exhibit principled uncertainty awareness. Specifically, we focus on theoretical analyses that quantify the benefits of exploiting prior knowledge as inductive bias in terms of data efficiency, safety, computational requirements, and optimality of learning. We derive these theoretical analyses through novel ideas at the intersection of control, learning, and formal methods. Based on the theoretical insights, we develop practical and computationally efficient algorithms, some of which have provable performance, real-time, and safety guarantees. To validate the effectiveness of our algorithms, we conduct experiments in high-fidelity robotics and flight simulators, as well as on real-world hardware such as a Toyota Supra car and a custom-built hexacopter. Remarkably, when applied in real-world settings, our algorithms provide high performance for control tasks that push the system beyond the limits of the prior knowledge and data coverage, despite being trained on only a handful of system trajectories or a few minutes worth of data.Item Learning methods for sequential decision making with imperfect representations(2011-12) Kalyanakrishnan, Shivaram 1983-; Stone, Peter, 1971-; Mooney, Raymond J; Miikkulainen, Risto; Ballard, Dana H; Parr, RonaldSequential decision making from experience, or reinforcement learning (RL), is a paradigm that is well-suited for agents seeking to optimize long-term gain as they carry out sensing, decision, and action in an unknown environment. RL tasks are commonly formulated as Markov Decision Problems (MDPs). Learning in finite MDPs enjoys several desirable properties, such as convergence, sample-efficiency, and the ability to realize optimal behavior. Key to achieving these properties is access to a perfect representation, under which the state and action sets of the MDP can be enumerated. Unfortunately, RL tasks encountered in the real world commonly suffer from state aliasing, and nearly always they demand generalization. As a consequence, learning in practice invariably amounts to learning with imperfect representations. In this dissertation, we examine the effect of imperfect representations on different classes of learning methods, and introduce techniques to improve their practical performance. We make four main contributions. First we introduce “parameterized learning problems”, a novel experimental methodology facilitating the systematic control of representational aspects such as state aliasing and generalization. Applying this methodology, we compare the class of on-line value function-based (VF) methods with the class of policy search (PS) methods. Results indicate clear patterns in the effects of representation on these classes of methods. Our second contribution is a deeper analysis of the limits imposed by representations on VF methods; specifically we provide a plausible explanation for the relatively poor performance of these methods on Tetris, the popular video game. The third major contribution of this dissertation is a formal study of the “subset selection” problem in multi-armed bandits. This problem, which directly affects the sample-efficiency of several commonly-used PS methods, also finds application in areas as diverse as industrial engineering and on-line advertising. We present new algorithms for subset selection and bound their performance under different evaluation criteria. Under a PAC setting, our sample complexity bounds indeed improve upon existing ones. As its fourth contribution, this dissertation introduces two hybrid learning architectures for combining the strengths of VF and PS methods. Under one architecture, these methods are applied in sequence; under the other, they are applied to separate components of a compound task. We demonstrate the effectiveness of these methods on a complex simulation of robot soccer. In sum, this dissertation makes philosophical, analytical, and methodological contributions towards the development of robust and automated learning methods for sequential decision making with imperfect representationsItem Learning optimal sampling policies for sketching of huge data matrices(2021-05-09) Heo, Taemin; Bajaj, ChandrajitThis study presents methods called TSGPR- and SAC-SketchyCoreSVD to improve the protocol of subsampling data fibers for building random sketches and low rank SVD of a data matrix by formulating them as a sequential decision making problem. An agent progressively decides which data fibers to subsample next to maximize the accuracy of low rank SVD under the limited computational resources. Using the information coming from the partially observed data matrix constructed by subsampled fibers so far, methods learn the optimal policy. Thompson sampling and Gaussian process regression are used for TSGPR-SketchyCoreSVD, and Soft Actor-Critic is applied to SAC-SketchyCoreSVD. Experiments show TSGPR-SketchyCoreSVD actively learns the subsampling policy and produces higher accuracy than the original SketchyCoreSVD. SAC-SketchyCoreSVD is still developing, but its intermediate result also shows promising results. This study can be easily expanded to higher order data tensors.