Discrete-time partially observed Markov decision processes: ergodic, adaptive, and safety control
In this dissertation we study stochastic control problems for systems modelled by discrete-time partially observed Markov decision processes. The issues we consider include ergodic control, adaptive control, and safety control. For ergodic control we propose a new condition that weakens the elegant interior accessibility assumption suggested recently. Using the standard procedure to transform the partially observed control problem to its completely observed equivalent, and then applying the vanishing discount method, we obtain Bellman’s ergodic optimality equation, which characterizes the optimal policy. We also provide an example to compare our assumption with those of previous work. When there are more than one decision maker in the system, we formulate our problem as a stochastic non-cooperative game where each decision maker seeks to minimize his or her own long-run average cost. A special class of systems with two decision makers and mixed observation structure is considered, and the existence of a Nash equilibrium for the policies is proved. In the study of adaptive control we extend settings of the ergodic control to the ones where the transition matrix is parameterized by a unknown vector. Motivated by notions of weak ergodicity, we propose a condition on the structure of the transition matrix that results in the ergodic behavior of the underlying controlled process. Under additional hypotheses, we show that the proposed adaptive policy is self-optimizing in appropriate sense. A new concept designated safety control is introduced in our work where the notion of safety is specified in terms of membership in a set called safe set. We study the choices of an appropriate policy (called safe policy) and an initial state probability distribution such that a safety request, which asks the state probability distribution of the system to lie in a given convex set at each time step, is met. Since the choice of a safe policy is not unique in general, we apply techniques of constrained Markov decision processes to find an optimal policy in appropriate sense among the candidates. We also develop an algorithm to find the largest set of initial state probability distributions corresponding to a given safe policy to meet the safety request. The algorithm is proved to terminate in finite steps under reasonable assumptions. Finally we investigate the safety control under partial observations. A machine replacement problem is studied in detail and numerical simulations are presented.