Decision-making for autonomous agents in adversarial or information-scarce settings

Date

2023-08

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Autonomous agents often operate in adversarial or information-scarce settings. These settings exist due to various factors, such as the coexistence of non-cooperative agents, computation limitations, communication losses, and imperfect sensors. To ensure high performance in the presence of such factors, decision-making algorithms for autonomous agents must limit the amount of sensitive information leaked to adversaries and rely on minimal information about their environment. We consider a variety of problems where an autonomous agent operates in an adversarial or information-scarce setting, and present novel theory and decision-making algorithms for these problems. First, we focus on an adversarial setting where a malicious agent aims to deceive its supervisor in probabilistic supervisory control setting. We formulate the deception problem as an expected cost minimization problem in a Markov decision process (MDP) where the cost function is motivated by the results from hypothesis testing. We show the existence of an optimal stationary deceptive policy and provide algorithms for the synthesis of optimal deceptive policies. From the perspective of the supervisor, we prove the NP-hardness of synthesizing optimal reference policies that prevent deception. We also show that synthesizing optimal deceptive policies under partial observations is NP-hard and provide synthesis algorithms by considering special classes of policies and MDPs. Second, as a part of decision-making in information-scarce settings, we consider a multiagent decision-making problem where a group of agents cooperates under communication losses. We model this problem with a multiagent MDP, quantify the intrinsic dependencies between the agents induced by their joint policy, and develop a decentralized policy execution algorithm for communication losses. For a variety of communication loss models, we provide performance lower bounds that are functions of the dependencies between the agents. We develop an algorithm for the synthesis of minimally dependent policies that optimize these lower bounds and thereby remain performant under communication losses. Finally, we consider the problem of optimization under limited information since autonomous agents often perform optimization as a part of their operation. We develop optimization algorithms for smooth convex optimization using sub-zeroth-order oracles that provide less information than zeroth and first-order oracles. For the directional preference oracle that outputs the sign of the directional derivative at the query point and direction, we show a 𝒪̃(𝑛⁴) sample complexity upper bound where 𝑛 is the number of dimensions. For the comparator oracle that compares the function value at two query points and outputs a binary comparison value, we show a 𝒪̃(𝑛⁴) sample complexity upper bound. For the noisy value oracle, we develop an algorithm with 𝒪̃(𝑛 [superscript 3.75] 𝒯 [superscript 0.75]) high probability regret bound where 𝒯 is the number of queries.

Description

LCSH Subject Headings

Citation