Multi-objective path-planning for autonomous agents using dynamic game theory

Date

2019-02-06

Authors

Selvakumar, Jhanani

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Autonomous systems which are designed to assist humans in complex environments, are often required to reliably operate under uncertainty. When probabilistic models for uncertainty are not available, the game-theoretic framework for adversarial/cooperative interactions allows us to solve problems for autonomous systems, such as control of uncertain dynamical systems, modeling biological systems, and deployment of sensor networks. This work focuses on decision-making and control problems for autonomous agents in uncertain environments. Characteristic sources of such uncertainty are wind or oceanic flows, radiation fields, and moving obstacles. In our approach, we model the agent-environment interactions induced by these sources of uncertainty as the actions of an adversary, which tries to prevent the agent from achieving its objective (e.g., reaching a target location). This modeling naturally leads to the formulation of a dynamic game between the autonomous agent and its environment. Control problems of autonomous agents that are subject to uncertain dynamic influences such as strong winds, fit into the structure of two-player zero-sum differential games. Many modern decision-making problems, however, cannot be put under the umbrella of zero-sum games because they involve complex interplay between multiple agents, which is not purely antagonistic. In this context, we address a special class of decision-making and path-planning problems, for autonomous agents that aim to reach a specified target set while avoiding multiple adversarial elements (such as mobile agents or obstacles). This class of problems, referred to as reach-avoid problems, corresponds to multi-player non-zero-sum dynamic games. Multi-player dynamic games typically require solving coupled partial differential equations, which is computationally and temporally expensive, if at all tractable. This intractability is particularly true, for problems of high dimensionality, and if there are agents in the game which have multiple objectives. For this reason, approximate solutions to dynamic multi-agent games are desirable in practice. Considering the binary objective of our agent of interest, we propose three approaches to the path-planning problem. Each approach is based on the characterization of risk to the agent, and uses a distinct method to determine a feasible solution to the multi-agent game. First, we propose an approximate divide-and-conquer approach that allows us to compute the global path for the agent of interest by concatenating local paths computed on a dynamic graph-abstraction of the environment. Through extensive simulations, we have demonstrated the effectiveness of the proposed approach. However, the proposed method does not guarantee global optimality or completeness of the solution, and also incurs considerable computational cost at each step. To improve computational tractability of the path-planning problem, next, we propose a feedback strategy based on greedy minimization of risk, where the risk metric is characterized with regard to the dual objective of the agent of interest. The same risk metric also aids us in partitioning the state-space of the game, which is useful to infer the outcome of the game from its initial conditions. The feedback strategy is computationally simple. Further, through numerical simulations, this approach has been found to be effective in a large number of cases, in guiding the autonomous vehicle to its target set. In order to further improve the target-reaching capability of the autonomous agent, we propose a third approach, a reduction of the dynamic multi-player game to a sequence of single-act games, one played at each time step. The proposed approach is also easy to implement and also does not incur significant loss of optimality. At each step, the optimal set of player strategies can be calculated efficiently and reliably via convex programming tools. More importantly, the proposed sequential formulation of the dynamic game allows us to account for the effect of the current actions of the agents on the final outcome of the original dynamic game. However, the payoffs of future games are altered by the past games and consequently, the equilibria for the single-act games (stage-wise equilibria) might not be optimal when the dynamic game is viewed as a whole. The choice of stage-wise equilibria can be improved by recording past actions and their effect on future payoffs. Drawing upon the history of actions and outcome patterns if any, we can learn to make better choices in the present. For multi-agent games with multiple non-aligned objectives for each agent, learning processes can aid in high-level switching between the optimal strategies corresponding to individual objectives. We propose the use of model-free reinforcement learning methods to obtain a feedback policy for the agent of interest. The challenges here, are to characterize an appropriate reward function, particularly under consideration of multiple objectives for the agent, and also to optimize parameters of the learning process. The goal of this thesis is to contribute a solid framework, which is based on game theory, and combines analytical and computational techniques, to address the problem of path-planning for an autonomous agent with multiple objectives in uncertain environments

Description

LCSH Subject Headings

Citation