A scalable information management middleware for large distributed systems
Information management is one of the key tasks of any large-scale distributed application. The goal of this dissertation is to design and build a general and scalable information management middleware for large distributed systems that will facilitate design, development, and deployment of distributed applications and that will enable application developers to explore the tradeoffs between communication cost, response latency, and consistency. In this dissertation, we present a Scalable Distributed Information Management System (SDIMS) that aggregates information about large-scale networked systems and that can serve as a basic building block for a broad range of large-scale distributed applications by providing detailed views of nearby information and summary views of global information. To serve as a basic building block, an SDIMS should have four properties: scalability to many machines and data items, flexibility to accommodate a broad range of applications, administrative isolation for security and availability, and robustness to node and network failures. We design, implement, and evaluate an SDIMS that (1) leverages Distributed Hash Tables (DHT) to create scalable aggregation trees, (2) provides flexibility through a simple API that lets applications control propagation of reads and writes and through a self-tuning mechanism that adapts the propagation to observed load in the system, (3) provides administrative isolation through a novel Autonomous DHT algorithm, and (4) achieves robustness to node and network reconfigurations through lazy reaggregation, on-demand reaggregation, and tunable spatial replication. Through extensive simulations and micro-benchmark experiments on several real testbeds, we observe that our system is an order of magnitude more scalable than existing approaches, provides a wide range of choices for applications to control the propagation of data to tradeoff the bandwidth cost with the response latency, achieves administrative isolation properties at a cost of modestly increased read latency in comparison to flat DHTs, and gracefully handles failures. We implement several applications on top of SDIMS including a file location system and a multicast system. We also use SDIMS in two other research efforts in our lab — as a controller for a distributed file replication system and as an information gathering plane in a distributed network monitoring system.