Monitoring uncertain data for sensor-based real-time systems
Monitoring of user-defined constraints on time-varying data is a fundamental functionality in various sensor-based real-time applications such as environmental monitoring, process control, location-based surveillance, etc. In general, these applications track real-world objects and constantly evaluate the constraints over the object trace to take a timely reaction upon their violation or satisfaction. While it is ideal that all the constraints are evaluated accurately in real-time, data streams often contain incomplete and delayed information, rendering the evaluation results of the constraints uncertain to some degree. In this dissertation, we provide a comprehensive approach to the problem of monitoring constraint-based queries over data streams for which the data or timestamp values are inherently uncertain. First, we propose a generic framework, namely Ptmon, for monitoring timing constraints and detecting their violation early, based on the notion of probabilistic violation time. In doing so, we provide a systemic approach for deriving a set of necessary timing constraints at compilation time. Our work is innovative in that the framework is formulated to be modular with respect to the probability distributions on timestamp values. We demonstrate the applicability of the framework for different timestamp models. Second, we present a probabilistic timing join operator, namely Ptjoin, as an extended functionality of Ptmon, which performs stream join operations based on temporal proximity as well as temporal uncertainty. To efficiently check the Ptjoin condition upon event arrivals, we introduce the stream-partitioning technique that delimits the probing range tightly. Third, we address the problem of monitoring value-based constraints that are in the form of range predicates on uncertain data values with confidence thresholds. A new monitoring scheme Spmon that can reduce the amount of data transmission and thus expedite the processing of uncertain data streams is introduced. The similarity concept that was originally intended for real-time databases is extended for our probabilistic data stream model where each data value is given by a probability distribution. In particular, for uniform and gaussian distributions, we show how we derive a set of constraints on distribution parameters as a metric of similarity distances, exploiting the semantics of probabilistic queries being monitored. The derived constraints enable us to formulate the probabilistic similarity region that suppresses unnecessary data transmission in a monitoring system.