# Browsing by Subject "Distributed computing"

Now showing 1 - 11 of 11

- Results Per Page
1 5 10 20 40 60 80 100

- Sort Options
Ascending Descending

Item Accelerating graph computation with system optimizations and algorithmic design(2021-08-06) Hoang, Loc Dac; Pingali, Keshav; Huang, Qixing; Rossbach, Christopher; Wu, BoShow more Most data in today's world can be represented in a graph form, and these graphs can then be used as input to graph applications to derive useful information, such as shortest paths in a road network, similarity between drugs in a drug-protein network, persons of interest in a social network, or recommended products for customers in a customer purchase history graph. Graphs are growing larger as time passes, so there is an ever-growing need for efficient graph applications. Developers typically have two methods for accelerating the runtime of a graph application: (1) they optimize the systems on which the graph application is run on, and/or (2) they optimize the algorithm itself and gain speedup via algorithmic novelties. In this dissertation, I propose work that accelerates graph applications from these two perspectives. Broadly speaking, the work I present in this dissertation will be split into systems work and algorithmic work. On the systems end, I present the CuSP system and the DeepGalois system: 1. CuSP, or Customizable Streaming Partitioner, is a fast and general distributed graph partitioner that generates partitions for distributed graph analytics systems to use. CuSP provides a solution to the problem of existing slow partitioners that can only handle a few built-in policies by providing users with a general interface for specifying streaming partitioning policies that CuSP will then use to efficiently partition graphs. Our evaluation of the system shows that it can partition up to 22× faster than the state-of-the-art offline graph partitioner XtraPulp while producing partitions that allow graph applications to run 2× faster on average than XtraPulp's partitions. CuSP can be extended to allow users to express specific partitioning policies for their algorithms as we show in a case study with distributed triangle counting. 2. DeepGalois is a distributed graph neural network system that uses the observation that graph neural network computation can be expressed as a graph problem which allows it to be implemented in graph analytics systems. DeepGalois is built using existing distributed graph analytics systems; CuSP and the Gluon communication substrate are used to partition GNN graphs and efficiently communicate partial aggregations and gradients. It also supports sampling and minibatching of the graph. Experimental results on up to 128 CPU machines demonstrate that DeepGalois scales and that DeepGalois's epoch time for its best host configurations for the evaluated graphs is on average 2× faster than DistDGL's epoch time for its best host configurations. From an algorithmic perspective, I present a novel round-efficient distributed betweenness centrality and a novel formulation of the graph transformer network as a graph algorithm that allows for more efficient computation. 1. Min Rounds Betweenness Centrality (MRBC) is a provably round efficient BC algorithm that uses a novel message update rule in order to only send out updates from a vertex if it knows that the data it is going to send it finalized. We prove the correctness of this rule as well as establish bounds on the maximum number of rounds the algorithm takes. We implement MRBC in the D-Galois distributed graph analytics system where we further reduce communication overhead by relaxing proxy update frequency based on the message send rule. Evaluation shows that compared to a classic Brandes BC algorithm implementation, it reduces the number of rounds by 14× and communication time by 2.8× on average. 2. Graph Transformer Networks (GTNs) are a variant of graph convolutional networks that learn and use important typed paths called metapaths in a heterogeneous graph in order to improve task accuracy. The original formulation of the problem uses a series of dense matrix multiplies that are space inefficient, and the matrix formulation makes it difficult to use fine-grained graph techniques like sampling. We formulate the GTN problem as a graph problem that is more space efficient as it does not need dense matrices. In addition, because it is formulated as a graph algorithm, we can apply metapath sampling on top of it to significantly decrease the computational load. Evaluation shows that the sampling-based graph formulation of the GTN can be up to 155× faster than the original formulation without any compromise in task accuracy.Show more Item Algorithms for analyzing parallel computations(2017-08) Chauhan, Himanshu; Garg, Vijay K. (Vijay Kumar), 1963-; Julien, Christine; Mittal, Neeraj; Nikolova, Evdokia; Pingali, Keshav; Reddi, Vijay JanapaShow more Predicate detection is a powerful technique to verify parallel programs. Verifying correctness of programs using this technique involves two steps: first we create a partial order based model, called a computation, of an execution of a parallel program, and then we check all possible global states of this model against a predicate that encodes a faulty behavior. A partial order encodes many total orders, and thus even with one execution of the program we can reason over multiple possible alternate execution scenarios. This dissertation makes algorithmic contributions to predicate detection in three directions. Enumerating all consistent global states of a computation is a fundamental problem requirement in predicate detection. Multiple algorithms have been proposed to perform this enumeration. Among these, the breadth-first search (BFS) enumeration algorithm is especially useful as it finds an erroneous consistent global state with the least number of events possible. The traditional algorithm for BFS enumeration of consistent global states was given more than two decades ago and is still widely used. This algorithm, however, requires space that in the worst case may be exponential in the number of processes in the computation. We give the first algorithm that performs BFS based enumeration of consistent global states of a computation in space that is polynomial in the number of processes. Detecting a predicate on a computation is a hard problem in general. Thus, in order to devise efficient detection and analysis algorithms it becomes necessary to use the knowledge about the properties of the predicate. We present algorithms that exploit the properties of two classes of predicates, called stable and counting predicates, and provide significant reduction — exponential in many cases — in time and space required to detect them. The technique of computation slicing creates a compact representation, called slice, of all global states that satisfy a class of predicates called regular predicate. We present the first distributed and online algorithm to create a slice of a computation with respect a regular predicate. In addition, we give efficient algorithms to create slices of two important temporal logic formulas even when their underlying predicate is not regular but either the predicate or its negation is efficiently detectable.Show more Item Compiler and system for resilient distributed heterogeneous graph analytics(2020-05) Gill, Gurbinder Singh; Pingali, Keshav; Fussell, Donald S; Rossbach, Christopher J; Peri, Ramesh; Mytkowicz, ToddShow more Graph analytics systems are used in a wide variety of applications including health care, electronic circuit design, machine learning, and cybersecurity. Graph analytics systems must handle very large graphs such as the Facebook friends graph, which has more than a billion nodes and 200 billion edges. Since machines have limited main memory, distributed-memory clusters with sufficient memory and computation power are required for processing of these graphs. In distributed graph analytics, the graph is partitioned among the machines in a cluster, and communication between partitions is implemented using a substrate like MPI. However, programming distributed-memory systems are not easy and the recent trend towards the processor heterogeneity has added to this complexity. To simplify the programming of graph applications on such platforms, this dissertation first presents a compiler called Abelian that translates shared-memory descriptions of graph algorithms written in the Galois programming model into efficient code for distributed-memory platforms with heterogeneous processors. An important runtime parameter to the compiler-generated distributed code is the partitioning policy. We present an experimental study of partitioning strategies for distributed work-efficient graph analytics applications on different CPU architecture clusters at large scale (up to 256 machines). Based on the study we present a simple rule of thumb to select among myriad policies. Another challenge of distributed graph analytics that we address in this dissertation is to deal with machine fail-stop failures, which is an important concern especially for long-running graph analytics applications on large clusters. We present a novel communication and synchronization substrate called Phoenix that leverages the algorithmic properties of graph analytics applications to recover from faults with zero overheads during fault-free execution and show that Phoenix is 24x faster than previous state-of-the-art systems. In this dissertation, we also look at the new opportunities for graph analytics on massive datasets brought by a new kind of byte-addressable memory technology with higher density and lower cost than DRAM such as intel Optane DC Persistent Memory. This enables the design of affordable systems that support up to 6TB of randomly accessible memory. In this dissertation, we present key runtime and algorithmic principles to consider when performing graph analytics on massive datasets on Optane DC Persistent Memory as well as highlight ideas that apply to graph analytics on all large-memory platforms. Finally, we show that our distributed graph analytics infrastructure can be used for a new domain of applications, in particular, embedding algorithms such as Word2Vec. Word2Vec trains the vector representations of words (also known as word embeddings) on large text corpus and resulting vector embeddings have been shown to capture semantic and syntactic relationships among words. Other examples include Node2Vec, Code2Vec, Sequence2Vec, etc (collectively known as Any2Vec) with a wide variety of uses. We formulate the training of such applications as a graph problem and present GraphAny2Vec, a distributed Any2Vec training framework that leverages the state-of-the-art distributed heterogeneous graph analytics infrastructure developed in this dissertation to scale Any2Vec training to large distributed clusters. GraphAny2Vec also demonstrates a novel way of combining model gradients during training, which allows it to scale without losing accuracyShow more Item Dataflow parallelism for large scale data mining(2010-08) Daruru, Srivatsava; Ghosh, Joydeep; Marin, NenaShow more The unprecedented and exponential growth of data along with the advent of multi-core processors has triggered a massive paradigm shift from traditional single threaded programming to parallel programming. A number of parallel programming paradigms have thus been proposed and have become pervasive and inseparable from any large production environment. Also with the massive amounts of data available and with the ever increasing business need to process and analyze this data quickly at the minimum cost, there is much more demand for implementing fast data mining algorithms on cheap hardware. This thesis explores a parallel programming model called dataflow, the essence of which is computation organized by the flow of data through a graph of operators. This paradigm exhibits pipeline, horizontal and vertical parallelism and requires only the data of the active operators in memory at any given time allowing it to scale easily to very large datasets. The thesis describes the dataflow implementation of two data mining applications on huge datasets. We first develop an efficient dataflow implementation of a Collaborative Filtering (CF) algorithm based on weighted co-clustering and test its effectiveness on a large and sparse Netflix data. This implementation of the recommender system was able to rapidly train and predict over 100 million ratings within 17 minutes on a commodity multi-core machine. We then describe a dataflow implementation of a non-parametric density based clustering algorithm called Auto-HDS to automatically detect small and dense clusters on a massive astronomy dataset. This implementation was able to discover dense clusters at varying density thresholds and generate a compact cluster hierarchy on 100k points in less than 1.3 hours. We also show its ability to scale to millions of points as we increase the number of available resources. Our experimental results illustrate the ability of this model to “scale” well to massive datasets and its ability to rapidly discover useful patterns in two different applications.Show more Item Design and implementation of distributed Galois(2013-05) Dhanapal, Manoj; Pingali, KeshavShow more The Galois system provides a solution to the hard problem of parallelizing irregular algorithms using amorphous data-parallelism. The present system works on the shared-memory programming model. The programming model has limitations on the memory and processing power available to the application. A scalable distributed parallelization tool would give the application access to a very large amount of memory and processing power by interconnecting computers through a network. This thesis presents the design for a distributed execution programming model for the Galois system. This distributed Galois system is capable of executing irregular graph based algorithms on a distributed environment. The API and programming model of the new distributed system has been designed to mirror that of the existing shared-memory Galois. This was done to enable existing applications on shared memory applications to run on distributed Galois with minimal porting effort. Finally, two existing test cases have been implemented on distributed Galois and shown to scale with increasing number of hosts and threads.Show more Item Distributed computing and cryptography with general weak random sources(2011-08) Li, Xin, Ph. D.; Zuckerman, David I.; Alvisi, Lorenzo; Kalai, Yael; Klivans, Adam; Waters, BrentShow more The use of randomness in computer science is ubiquitous. Randomized protocols have turned out to be much more efficient than their deterministic counterparts. In addition, many problems in distributed computing and cryptography are impossible to solve without randomness. However, these applications typically require uniform random bits, while in practice almost all natural random phenomena are biased. Moreover, even originally uniform random bits can be damaged if an adversary learns some partial information about these bits. In this thesis, we study how to run randomized protocols in distributed computing and cryptography with imperfect randomness. We use the most general model for imperfect randomness where the weak random source is only required to have a certain amount of min-entropy. One important tool here is the randomness extractor. A randomness extractor is a function that takes as input one or more weak random sources, and outputs a distribution that is close to uniform in statistical distance. Randomness extractors are interesting in their own right and are closely related to many other problems in computer science. Giving efficient constructions of randomness extractors with optimal parameters is one of the major open problems in the area of pseudorandomness. We construct network extractor protocols that extract private random bits for parties in a communication network, assuming that they each start with an independent weak random source, and some parties are corrupted by an adversary who sees all communications in the network. These protocols imply fault-tolerant distributed computing protocols and secure multi-party computation protocols where only imperfect randomness is available. The probabilistic method shows that there exists an extractor for two independent sources with logarithmic min-entropy, while known constructions are far from achieving these parameters. In this thesis we construct extractors for two independent sources with any linear min-entropy, based on a computational assumption. We also construct the best known extractors for three independent sources and affine sources. Finally we study the problem of privacy amplification. In this model, two parties share a private weak random source and they wish to agree on a private uniform random string through communications in a channel controlled by an adversary, who has unlimited computational power and can change the messages in arbitrary ways. All previous results assume that the two parties have local uniform random bits. We show that this problem can be solved even if the two parties only have local weak random sources. We also improve previous results in various aspects by constructing the first explicit non-malleable extractor and giving protocols based on this extractor.Show more Item Distributed global predicate detection algorithms(2012-08) Wong, Don Tak-San; Garg, Vijay K. (Vijay Kumar), 1963-; Graser, ThomasShow more Detecting the existence of a consistent global state that satisfies a predicate in a distributed environment is a processing intensive task since all the consistent global states must be checked to verify that none of them satisfies the predicate. Three different serial implementations have been provided for a breath-first, depth-first, and lexical traversal of the lattice generated by enumerating the possible consistent global states has been provided by Alagar and Venkatesan, Cooper and Marzullo, and Garg. This paper modifies those implementations to perform the checks in a distributed environment, providing the final algorithms, source code, and preliminary results for comparisons with the original algorithms.Show more Item Distributed model checking with Java PathFinder custom listeners(2017-05) Davis, Aaron Wynn; Khurshid, SarfrazShow more The goal of this project was to investigate a distributed testing system based on the Java PathFinder (JPF) model checker. A key objective in doing this was to increase the size of the state space which could be explored in a given time by spreading the test load across multiple instances of JPF while eliminating duplication among the execution paths tested by each instance. The advantage of this approach is that one of the JPF instances may locate an error earlier than it would have been found using the standard JPF model checker. The capability of JPF to use custom listeners was utilized to support this and avoid the need for changes to the JPF Core source code. A custom listener was developed as well as a Java server application which was used to manage the paths taken by each instance of JPF, prevent duplication of effort among the JPF instances and consolidate the results from the testing into a single report. The result was an increase in the state space which was tested with some tests either completing successfully or finding errors using a decreased number of transitions when compared with running a single instance of JPF. The number of states which were explored was also increased. However, it was also found that due to the increase in processing overhead required for the instances of JPF to communicate with the server unfortunately there was no improvement in the overall execution time and in many cases the execution time was increased when compared with running a single instance of JPF. It was also observed that the rate at which the execution time increased as more JPF instances were added was low.Show more Item The Gander search engine for personalized networked spaces(2012-12) Michel, Jonas Reinhardt; Julien, Christine, D. Sc.; Garg, VijayShow more The vision of pervasive computing is one of a personalized space populated with vast amounts of data that can be exploited by humans. Such Personalized Networked Spaces (PNetS) and the requisite support for general-purpose expressive spatiotemporal search of the “here” and “now” have eluded realization, due primarily to the complexities of indexing, storing, and retrieving relevant information within a vast collection of highly ephemeral data. This thesis presents the Gander search engine, founded on a novel conceptual model of search in PNetS and targeted for environments characterized by large volumes of highly transient data. We overview this model and provide a realization of it via the architecture and implementation of the Gander search engine. Gander connects formal notions of sampling a search space to expressive, spatiotemporal-aware protocols that perform distributed query processing in situ. This thesis evaluates Gander through a user study that examines the perceived usability and utility of our mobile application, and benchmarks the performance of Gander in large PNetS through network simulation.Show more Item Implicitly distributing pervasively concurrent programs(2020-12-10) Thywissen, John Adam; Rossbach, Christopher J.; Cook, William Randall; Misra, Jayadev; Peter, Simon; Gligoric, MilosShow more Distributed programs are often written as a collection of communicating modules. For example, to use Java RMI, programs are divided into objects which can be remotely referenced. Yet, in many cases, it would be desirable to write the program without the program structure being driven by distribution decisions. If distribution is decoupled from program structure, the programming language must allow communication throughout a program’s structure, instead of at a few known points. This situation is simplified if the programming language provides a uniform programming model for local and remote values (location transparency). We present Distributed Orc, which offers location transparency, and distributes program operations automatically in cooperation with the execution environment. By eliminating any special semantics of remote values, Distributed Orc enables programmers to write cohesive distributed programs, rather than programs artificially divided at distribution boundaries. Distributed Orc is derived from the Orc language, a (centralized) concurrent orchestration language.Show more Item Scalable and causal Bayesian inference(2021-08-30) Chavez, Omar Demian; Williamson, Sinead; Daniels, Michael J; Linero, Antonio; Shively, TomShow more This thesis will focus on two facets of Bayesian estimation. First, we propose methods that can improve parameter estimation in particle filtering when making use of a distributed computing environment by allowing for periodic communication between compute nodes. The periodic communication can improve the embarrassingly parallel version of our particle filter without dramatically increasing the computational costs. Our method is intended for use on data with large N or in streaming settings where latent parameters are changing over time. Secondly, we propose a method for estimating heterogeneous treatment effects in observational studies using transformed response variables via a modification to Bayesian additive regression trees that incorporates a mixture model in the regression error terms.Show more