Spatial stochastic models for network analysis

Sankararaman, Abishek

Spatial stochastic models for network analysis

Access full-text files

SANKARARAMAN-DISSERTATION-2019.pdf (1.55 MB)

Date

2019-08

Authors

Sankararaman, Abishek

Abstract

This thesis proposes new stochastic interacting particle models for networks, and studies some fundamental properties of these models. This thesis considers two application areas of networking - engineering design questions in future wireless systems and algorithmic tasks in large scale graph structured data. The key innovation introduced in this thesis is to bring tools and ideas from stochastic geometry to bear on the problems in both these application domains. We identify certain fundamental questions in design and engineering both wireless systems and large scale graph structured data processing systems. Subsequently, we identify novel stochastic geometric models, that captures the fundamental properties of these networks, which forms the first research contribution. We then rigorously study these models, by bringing to bear new tools from stochastic geometry, random graphs, percolation and Markov processes to establish structural results and fundamental phase transitions in these models. Using our developed mathematical methodology, we then identify design insights and develop algorithms, which we demonstrate are instructive in many practical settings. In the setting of wireless systems, this thesis studies both ad-hoc and cellular networks. In the ad-hoc network setting, we aim to understand fundamental limits of the simplest possible protocol to access the spectrum, namely a link transmits whenever it has data to send by treating all interference as noise. Surprisingly this basic question itself was not understood, as the system dynamics is coupled spatially due to the interference links cause one another and temporally due to randomness in traffic arrivals. We propose a novel interacting particle model called the spatial birth-death wireless network model to understand the stability properties of the simple spectrum access protocol. Using tools from Palm calculus and fluid limit theory, we establish a tight characterization of when this model is stable. Furthermore, we show that whenever stable, the links in steady-state exhibit a form of clustering. Leveraging these structural results, we propose two mean field heuristics to obtain formulas for key performance metrics such as average delay experienced by a link. We empirically find that the proposed formulas for delay predicts accurately the system behavior. We subsequently study scalability properties of this model by introducing an appropriate infinite dimensional version of the model we call the Interference Queueing Networks model. The model consists of a queue located at each grid point of an infinite regular integer lattice, with the queues interacting with each other in a translation invariant fashion. We then prove several structural properties of the model namely, tight conditions for existence of stationary solutions and some sufficient conditions for uniqueness of stationary solutions. Remarkably, we obtain exact formula for mean delay in this model, unlike the continuum model where we relied on mean-field type heuristics to obtain insights. In the setting of cellular networks, we study optimal association schemes by mobile phones in the case when there are several possible base station technologies operating on orthogonal bands. We show that this choice leads to a performance gain we term technology diversity. Interestingly, we show that the performance gain relies on the amount of instantaneous information a user has on the various base station technologies that it can leverage to make the association decision. We outline optimal association schemes under various information settings that a user may have on the network. Moreover, we propose simple heuristics for association that relies on a user obtaining minimal instantaneous information and are thus practical to implement. We prove that in certain natural asymptotic regime of parameters, our proposed heuristic policy is also optimal, and thus quantifying the value of having fine grained information at a user for association. We empirically observe that the asymptotic result is valid even at finite parameter regimes that are typical in todays networks. In the application of analyzing large scale graph structured data, we consider the graph clustering problem with side information. Graph clustering is a standard and widely used task which consists in partitioning the set of nodes of a graph into underlying clusters where nodes in the same cluster are similar to each other and nodes across different clusters are different. Motivated by applications in social and biological networks, we consider the task of clustering nodes of a graph, when there is side information on the nodes, other than that contained in the graph. For instance in social networks, one has access to meta data about a person (node in a social graph) such as age, location, income etc, along with the combinatorial data of who are his friends on the social graph. Similarly, in biological networks, there is often meta-data about an experiment that provides additional contextual data about a node, in addition to the combinatorial data. In this thesis, we propose a generative model for such graph structured data with side information, which is inspired by random graph models in stochastic geometry such as the random connection model and the generative models for networks with clusters without contexts, such as the stochastic block model or the planted partition model. We propose a novel graph model called the planted partition random connection model. Roughly speaking, in this model, each node has two labels - an observable R [superscript d] valued (for some fixed d) feature label and an unobservable binary valued community label. Conditional on the node labels, edges are drawn at random in this graph depending on both the feature and community labels of the two end points. The clustering task consists in recovering the underlying partition of nodes corresponding to the respective community labels better than a random assignment, when given an observation of the graph generated and the features of all nodes. We show that if the 'density of nodes', i.e., average number of nodes having features in an unit volume of space of R [superscript d] is small, then no algorithm can cluster the graph that can asymptotically beat a random assignment of community labels. On the contrary, if the density of nodes is sufficiently high, we give a simple algorithm that recovers the true underlying partition strictly better a random assignment. We then apply the proposed algorithm to a problem in computational biology called Haplotype Phasing and observe empirically, that it obtains state of art results. This demonstrates, both the validity of our generative model, as well as our new algorithm.