Robust estimation of tree structured probabilistic graphical models

Katiyar, Ashish

Robust estimation of tree structured probabilistic graphical models

Access full-text files

KATIYAR-DISSERTATION-2021.pdf (3.52 MB)

Date

2021-08-13

Authors

Katiyar, Ashish

Abstract

Undirected probabilistic graphical models or Markov Random Fields (MRFs) are a powerful tool for describing high dimensional distributions using an associated dependency graph 𝔾, which encodes the conditional dependencies between random variables. They form the starting point for many efficient estimation and inference algorithms. Thus, learning the graphical model of a collection of random variables from their samples is a fundamental, and very well-studied problem. In this thesis, we study a natural variant of this problem - learning the graph structure when the random variables have independent unknown noise. We investigate this problem for the class of tree structured graphical models. In the first problem, the task is to estimate tree structured Gaussian graphical models from samples which have additive independent Gaussian noise of unknown variance. The noise in different random variables breaks down the conditional independence relationship. We ask: can the original tree structure be recovered. We prove that this problem is unidentifiable, but show that this unidentifiability is limited to a small class of candidate trees. We further present additional constraints under which the problem is identifiable. In the second problem, we consider tree structured Ising models. The random variables in Ising models have support on {−1, +1}. We consider the task of learning Ising models when the signs of different random variables are flipped independently with possibly unequal, unknown probabilities. We prove that, surprisingly, the same limited unidentifiability results that hold for Gaussian graphical models continue to hold for Ising models. In the final problem, we study the natural extension of these problems - what happens in the case of graphical models on discrete random variables with larger support size. We show that the setting of support size of 3 or more is richer as the tree may be partially or fully identifiable. We provide a precise characterization of this phenomenon and show that the extent of recoverability is dictated by the joint PMF of the random variables. In particular, we provide necessary and sufficient conditions for exact recoverability. We provide an efficient algorithm to recover the tree upto the identifiability. Finally, we conclude with the sample complexity upper and lower bounds capturing the dependence of the number of samples on the underlying parameters.