Identifying infection processes with incomplete information

Milling, Philip Christopher

Identifying infection processes with incomplete information

Access full-text files

MILLING-DISSERTATION-2014.pdf (1.49 MB)

Date

2014-12

Authors

Milling, Philip Christopher

Abstract

Infections frequently occur on both networks of devices and networks of people, and can model not only viruses, but also information, rumors, and product use. However, in many circumstances, the infection process itself is hidden, and only the effects, e.g. sickness or knowledge, can be observed. In addition, this information is likely incomplete, missing many sick nodes, as well as inaccurate, with false positives. To use this data effectively, it is often essential to identify the infection process causing the sickness, or even whether the cause is an infection. For our purposes, we consider the susceptible-infected (SI) infection model. We seek to distinguish between infections and random sickness, as well as between different infection (or infection-like) processes in a limited information setting. We formulate this as a hypothesis testing problem, where (typically) in the null, the sickness affects nodes at random, and in the alternative, the infection is spread through the network. Similarly, we consider the case where the sickness may be caused by one of two infection (or infection-like) processes, and we wish to find which is the causative process. We do this is a setting with very limited information, given only a single snapshot of the infection. Only a small portion of the infected population reports the sickness. In addition, there are several other limitations we consider. There may be false positives, obfuscating the infection. Similarly, there may be a random sickness and epidemic process occurring simultaneously. Knowledge of the graph topology may be incomplete, with unknown edges over which the infection may spread. The graph may also be weighted, affecting the way the infection spreads over the graph. In all these cases, we develop algorithms to identify the causative process of the infection utilizing the fact that infected nodes will be clustered. We demonstrate that under reasonable conditions, these algorithms detect an infection with asymptotically zero error probability as the graph size increases.