Browsing by Subject "Electronic data processing--Distributed processing"

Now showing 1 - 10 of 10

Algorithms for distributed caching and aggregation
(2007-12) Tiwari, Mitul; Plaxton, C. Greg
In recent years, there has been an explosion in the amount of distributed data due to the ever decreasing cost of both storage and bandwidth. There is a growing need for automatic distributed data management techniques. The three main areas in dealing with distributed data that we address in this dissertation are (1) cooperative caching, (2) compression caching, and (3) aggregation. First, we address cooperative caching, in which caches cooperate to locate and cache data objects. The benefits of cooperative caching have been demonstrated by various studies. We address a hierarchical generalization of cooperative caching in which caches are arranged as leaf nodes in a hierarchical tree network, and we call this variant Hierarchical Cooperative Caching. We present a deterministic hierarchical generalization of LRU that is constantcompetitive when the capacity blowup is linear in d, the depth of the cache hierarchy. Furthermore, we show that any randomized hierarchical cooperative caching algorithm with capacity blowup b has competitive ratio Ω(log d b ) against an oblivious adversary. Thus we establish that there is no resource competitive algorithm for the hierarchical cooperative caching problem. Second, we address a class of compression caching problems in which a file can be cached in multiple formats with varying sizes and encode/decode costs. In this work, we address three problems in this class of compression caching. The first problem assumes that the encode cost and decode cost associated with any format of a file are equal. For this problem we present a resource competitive online algorithm. To explore the existence of resource competitive online algorithms for compression caching with arbitrary encode costs and decode costs, we address two other natural problems in the aforementioned class, and for each of these problems, we show that there exists a non-constant lower bound on the competitive ratio of any online algorithm, even if the algorithm is given an arbitrary factor capacity blowup. Thus, we establish that there is no resource competitive algorithm for compression caching in its full generality. Third, we address the problem of aggregation over trees with the goal of adapting aggregation aggressiveness. Consider a distributed network with nodes arranged in a tree, and each node having a local value. We consider the problem of aggregating values (e.g., summing values) from all nodes to the requesting nodes in the presence of writes. The goal is to minimize the total number of messages exchanged. The key challenges are to define a noix tion of “acceptable” aggregate values, and to design algorithms with good performance that are guaranteed to produce such values. We formalize the acceptability of aggregate values in terms of certain consistency guarantees. We propose a lease-based aggregation mechanism, and evaluate algorithms based on this mechanism in terms of consistency and performance. With regard to consistency, we adapt the definitions of strict and causal consistency to apply to the aggregation problem. We show that any lease-based aggregation algorithm provides strict consistency in sequential executions, and causal consistency in concurrent executions. With regard to performance, we propose an online lease-based aggregation algorithm, and show that, for sequential executions, the algorithm is constant competitive against any offline algorithm that provides strict consistency. Our online lease-based aggregation algorithm is presented in the form of a fully distributed protocol, and the aforementioned consistency and performance results are formally established with respect to this protocol. We also present experimental results to show that the algorithm performs well under various workloads.
Essays on market-based information systems design and e-supply chain
(2005-12) Guo, Zhiling, 1974-; Whinston, Andrew B.
Mechanisms and algorithms for large-scale replication systems
(2004) Venkataramani, Arunkumar; Dahlin, Michael
Observation and verification of software for distributed systems
(1995-08) Tomlinson, Alexander Ivor, 1964; Not available
Offline debugging of distributed processes
(1991) Chin, Bryan Scott, 1968-; Garg, Vijay K. (Vijay Kumar), 1963-
This thesis addresses the problem of debugging a distributed system. We define debugging as the process of diagnosing and correcting errors in a target application. In distributed systems, problems arise from the non-deterministic execution of distributed processes. Hence, we cannot take concepts directly from debuggers of sequential programs. We categorize debuggers as static, interactive, or post-mortem, according to the time at which they perform their analysis of the target application. This thesis focuses on offline debugging, a type of post-mortem debugger. We choose offline debugging because of its automated control, its ability to model and search the entire state space, and its reduced probe effect. An offline debugger consists of two components: the monitor and the offline debugger. The monitor individually observes each involved process as it executes and creates trace files. The second component, the offline debugger, arranges these local trace files into a global search space and searches it to detect whether certain predicate search expressions existed during the execution of the target application. In this implementation, we use four algorithms based on a depth-first traversal of the consistent lattice to detect the predicate search expressions. Finally, we present the C language implementation of our practical offline debugger. We conclude by noting that while the offline debugger is a powerful tool, a truly complete set of development tools should include more than one debugger to address different stages of the software lifecycle
Online detection of distributed predicates in distributed programs
(1992) Hoagland, Greg Michael, 1967-; Garg, Vijay K. (Vijay Kumar), 1963-
This thesis presents an online debugger for detecting distributed predicates in a distributed program and an environment for creating event-driven distributed programs that can be debugged by our online debugger. Detecting predicates in a distributed program is a difficult problem because the global state is divided among the processors in the distributed system. Also, some of the predicates that are well-defined for a sequential program are ill-defined for a distributed program. We use a special logic for defining the syntax and semantics of distributed predicates. Our debugger implements a set of distributed algorithms that is able to detect online a subset of these predicates in a distributed program. Once a predicate is detected the distributed computation can be halted in a global state where the predicate is true. The debugger has been developed using C on a Sun3/80 under SunOs4.0
SAR: semantic-aware replication
(2005) Gao, Lei; Dahlin, Mike
This dissertation presents a replication framework that facilitates semantic-aware data replication (SAR) in wide area networks (WANs). WAN data replication is fundamentally difficult. As a result, generic replication algorithms must make compromises among Consistency, Availability, Response time, and Partition resilience (CARP) when used in WANs. This dissertation seeks to design algorithms based on specific semantics of the shared data sets (e.g. data properties, workload characteristics, and update patterns) to achieve the optimized CARP trade-offs. Integrating a set of semantic-aware algorithms using distributed objects to form the SAR framework, we implement a practically important e-commerce application, the distributed TPC-W benchmark. Our prototype evaluations show significant improvements on system availability and response time while preserving the consistency guarantees desired by the TPC-W benchmark. The primary focus of the dissertation is on the development of the SAR framework. Within the framework, contributions include (a) exploiting application semantics using the object-oriented approach, (b) employing a hybrid method that integrates a number of novel replication algorithms to make an important class of applications work, (c) proposing a novel replication algorithm for the multi-writer/multi-reader replication scenario with a high access locality, and (d) outlining a general purpose replication library that uses semantic-aware objects for building other distributed applications in WANs.
A scalable information management middleware for large distributed systems
(2005) Yalagandula, Praveen; Dahlin, Michael
Information management is one of the key tasks of any large-scale distributed application. The goal of this dissertation is to design and build a general and scalable information management middleware for large distributed systems that will facilitate design, development, and deployment of distributed applications and that will enable application developers to explore the tradeoffs between communication cost, response latency, and consistency. In this dissertation, we present a Scalable Distributed Information Management System (SDIMS) that aggregates information about large-scale networked systems and that can serve as a basic building block for a broad range of large-scale distributed applications by providing detailed views of nearby information and summary views of global information. To serve as a basic building block, an SDIMS should have four properties: scalability to many machines and data items, flexibility to accommodate a broad range of applications, administrative isolation for security and availability, and robustness to node and network failures. We design, implement, and evaluate an SDIMS that (1) leverages Distributed Hash Tables (DHT) to create scalable aggregation trees, (2) provides flexibility through a simple API that lets applications control propagation of reads and writes and through a self-tuning mechanism that adapts the propagation to observed load in the system, (3) provides administrative isolation through a novel Autonomous DHT algorithm, and (4) achieves robustness to node and network reconfigurations through lazy reaggregation, on-demand reaggregation, and tunable spatial replication. Through extensive simulations and micro-benchmark experiments on several real testbeds, we observe that our system is an order of magnitude more scalable than existing approaches, provides a wide range of choices for applications to control the propagation of data to tradeoff the bandwidth cost with the response latency, achieves administrative isolation properties at a cost of modestly increased read latency in comparison to flat DHTs, and gracefully handles failures. We implement several applications on top of SDIMS including a file location system and a multicast system. We also use SDIMS in two other research efforts in our lab — as a controller for a distributed file replication system and as an information gathering plane in a distributed network monitoring system.
Techniques for analyzing distributed computations
(2002) Mittal, Neeraj; Garg, Vijay K. (Vijay Kumar), 1963-
Inherent non-determinism in distributed programs and presence of multiple threads of control makes it difficult to write correct distributed software. Not surprisingly, distributed systems are particularly vulnerable to software faults. To build a distributed system capable of tolerating software faults, two important problems need to be addressed: fault detection and fault recovery. The fault detection problem requires finding a (consistent) global state of the computation that satisfies certain predicate (e.g., violation of mutual exclusion). To prevent a fault from causing any serious damage such as corrupting stable storage, it is essential that it be detected in a timely manner. However, we prove that detecting a predicate in 2-CNF, even when no two clauses contain variables from the same process, is an NP-complete problem. We develop a technique, based on computation slicing, to reduce the size of the computation and thus the number of global states to be examined for detecting a predicate. Slicing can be used to throw away the extraneous global states of the computation in an efficient manner, and focus on only those that are currently relevant for our purpose. To detect a fault, therefore, rather than searching the state-space of the computation, it is much more efficient to search the state-space of the slice. We identify several useful classes of predicates for which the slice can be computed efficiently. Our experimental results indicate that slicing can lead to an exponential reduction over existing techniques both in terms of time as well as space for fault detection. To recover from faults, we consider rollback recovery approach, which involves restoring the system to a previous state and then re-executing. We focus on rollback recovery using controlled re-execution, which is useful and effective for tolerating synchronization faults. Unlike other approaches which depend on chance and do not ensure that the re-execution is fault-free, the controlled re-execution method avoids synchronization faults during re-execution in a deterministic fashion. Specifically, it selectively adds synchronization dependencies during re-execution to ensure that the previously detected synchronization faults do not occur again. We provide efficient algorithms to solve the problem for two important classes of synchronization faults.
Transparent replication
(2006) Nayate, Amol Pramod; Dahlin, Mike
Increasing user expectations and demands have caused the evolution of web services away from single-server systems and toward distributed systems for their ability to provide improved throughput, improved availability and reduced response times. However, for a service to run on a distributed system, each running instance must be able to access data that are shared among the instances. Although existing off-the-shelf replication systems - e.g. distributed file systems [52, 61, 32, 38, 41], replicated databases [64, 75], distributed hash tables [58, 59, 63, 34], etc. - simplify access to shared data by exporting wellresearched interfaces, their implementations are typically not engineered for the unique environments presented by many web services. For example, replication systems that require synchronization across multiple nodes to handle modified data [38, 12] or systems that require all nodes to keep a copy of all data [64, 75] may not be practical for use in such services. Although the problem of general replication is not possible to solve [11, 62, 33] we focus our study on a class of single-writer services that we denote Information Dissemination Services that form a restrictive but important set of web services. Our research makes two key contributions. First, we show that for a class of single-writer services that we denote Information Dissemation Services TRIP replicates dynamic data in a manner that is nearly transparent to the service. We (1) develop a novel dual-channel replication algorithm for TRIP that utilizes spare network background traffic to speculatively replicate data in a safe, non-interfering fashion, (2) show how to integrate safe speculative replication with mechanisms that use invalidates to provide consistency, and (3) demonstrate how our combination of consistency and safe speculative replication allows us to provide near-ideal consistency, performance, and availability for Information Dissemination Services. Second, we show that the core principles behind building TRIP can be extended to build a new replication framework and more general replication toolkit. In particular, we show that it is possible to extend our dual-queue mechanisms developed for TRIP to a multi-writer environment where nodes can synchronize multiple incoming streams of data and consistency information. Our extension allows providing various forms of consistency for arbitrary topologies - two key properties provided by the PRACTI [6] (Partial Replication, Arbitrary Consistency, Topology Independence) architecture.