Browsing by Subject "reinforcement learning"
Now showing 1 - 7 of 7
- Results Per Page
- Sort Options
Item Agent-Based Markov Modeling for Improved COVID-19 Mitigation Policies(Journal of Artificial Intelligence Research, 2021-08) Capobianco, Roberto; Kompella, Varun; Ault, James; Sharon, Guni; Jong, Stacy; Fox, Spencer; Meyers, Lauren; Wurman, Peter R.; Stone, PeterItem Balancing Individual Preferences and Shared Objectives in Multiagent Reinforcement Learning(International Joint Conference on Artificial Intelligence, 2020-07) Durugkar, Ishan; Liebman, Elad; Stone, PeterItem Efficient Representation for Electric Vehicle Charging Station Operations using Reinforcement Learning(Hawaii Intl. Conf. System Sciences, 2022-01) Kwon, Kyung-bin; Zhu, HaoItem Optimal Dynamic Treatment Regime by Reinfocement Learning in Clinical Medicine(2020) Song, Mina; Han, DavidPrecision medicine allows personalized treatment regime for patients with distinct clinical history and characteristics. Dynamic treatment regime implements a reinforcement learning algorithm to produce the optimal personalized treatment regime in clinical medicine. The reinforcement learning method is applicable when an agent takes action in response to the changing environment over time. Q-learning is one of the popular methods to develop the optimal dynamic treatment regime by fitting linear outcome models in a recursive fashion. Despite its ease of implementation and interpretation for domain experts, Q-learning has a certain limitation due to the risk of misspecification of the linear outcome model. Recently, more robust algorithms to the model misspecification have been developed. For example, the inverse probability weighted estimator overcomes the aforementioned problem by using a nonparametric model with different weights assigned to the observed outcomes for estimating the mean outcome. On the other hand, the augmented inverse probability weighted estimator combines information from both the propensity model and the mean outcome model. The current statistical methods for producing the optimal dynamic treatment regime however allow only a binary action space. In clinical practice, some combinations of treatment regime are required, giving rise to a multi- dimensional action space. This study develops and demonstrates a practical way to accommodate a multi-level action space, utilizing currently available computational methods for the practice of precision medicine.Item Reinforcement Learning for Generating Toolpaths in Additive Manufacturing(University of Texas at Austin, 2018) Patrick, Steven; Nycz, Andrzej; Noakes, MarkGenerating toolpaths plays a key role in additive manufacturing processes. In the case of 3-Dimensional (3D) printing, these toolpaths are the paths the printhead will follow to fabricate a part in a layer-by-layer fashion. Most toolpath generators use nearest neighbor (NN), branch-and-bound, or linear programming algorithms to produce valid toolpaths. These algorithms often produce sub-optimal results or cannot handle large sets of traveling points. In this paper, the researchers at Oak Ridge National Laboratory’s (ORNL) Manufacturing Demonstration Facility (MDF) propose using a machine learning (ML) approach called reinforcement learning (RL) to produce toolpaths for a print. RL is the process of two agents, the actor and the critic, learning how to maximize a score based upon the actions of the actor in a defined state space. In the context of 3D printing, the actor will learn how to find the optimal toolpath that reduces printhead lifts and global print time.Item Reinforcement Learning for Optimization of COVID-19 Mitigation Policies(2020-10) Kompella, Varun; Capobianco, Roberto; Jong, Stacy; Browne, Jonathan; Fox, Spencer J.; Meyers, Lauren Ancel; Wurman, Peter; Stone, PeterThe year 2020 has seen the COVID-19 virus lead to one of the worst global pandemics in history. As a result, governments around the world are faced with the challenge of protecting public health, while keeping the economy running to the greatest extent possible. Epidemiological models provide insight into the spread of these types of diseases and predict the effects of possible intervention policies. However, to date, the even the most data-driven intervention policies rely on heuristics. In this paper, we study how reinforcement learning (RL) can be used to optimize mitigation policies that minimize the economic impact without overwhelming the hospital capacity. Our main contributions are (1) a novel agent-based pandemic simulator which, unlike traditional models, is able to model fine-grained interactions among people at specific locations in a community; and (2) an RL-based methodology for optimizing fine-grained mitigation policies within this simulator. Our results validate both the overall simulator behavior and the learned policies under realistic conditions.Item Reinforcement learning strategies support generalization of learned hierarchical knowledge(2021) McKee, Connor; Preston, AlisonIn our everyday lives, we must learn and utilize context-specific information to inform our decision making. How do we learn what choices to make based on our memories? Prior rodent work has demonstrated that after learning, knowledge becomes organized hierarchically in a context-dependent manner. Here, we quantify the emergence of context-dependent hierarchical knowledge during learning and examine the flexible use of that knowledge to generalize across different scenarios. Participants learned about objects with context-dependent reward values in an X-shaped virtual environment consisting of an elongated, contextually-varying hallway with decision points on either end. First, participants learned the context-dependent object-reward pairings for one set of three objects. Next, they learned the context-dependent object-reward pairings for a new set of three objects. We hypothesized that prior knowledge of the hierarchical structure would generalize to the second set of objects as evidenced by a facilitation in learning rates. Participants gradually learned the context-dependent object-reward pairings during learning. When introduced to the new object set, learning rates did not significantly differ, indicating generalization of the hierarchical reward structure to the new object set. To further quantify how decision making unfolded, we applied three types of reinforcement learning (RL) models to our behavioral data: model-free, model-based (MB), and combination model-based model-free (MBMF). The MB model performed the best at using participants’ past selections to successfully predict future decisions and reward value expectations, indicating that current decisions were guided by prior selections. The MBMF model was best able to represent changes in participant learning across runs, possibly due to the model’s ability to assess different learning strategies. Overall, our results demonstrate that participants learned to flexibly decide which actions were the most adaptive, promoting correct decision-making in a given context. Furthermore, the structure of prior knowledge may support the generalization of learned experience.