Accelerating graph computation with system optimizations and algorithmic design
Most data in today's world can be represented in a graph form, and these graphs can then be used as input to graph applications to derive useful information, such as shortest paths in a road network, similarity between drugs in a drug-protein network, persons of interest in a social network, or recommended products for customers in a customer purchase history graph. Graphs are growing larger as time passes, so there is an ever-growing need for efficient graph applications. Developers typically have two methods for accelerating the runtime of a graph application: (1) they optimize the systems on which the graph application is run on, and/or (2) they optimize the algorithm itself and gain speedup via algorithmic novelties. In this dissertation, I propose work that accelerates graph applications from these two perspectives. Broadly speaking, the work I present in this dissertation will be split into systems work and algorithmic work. On the systems end, I present the CuSP system and the DeepGalois system:
- CuSP, or Customizable Streaming Partitioner, is a fast and general distributed graph partitioner that generates partitions for distributed graph analytics systems to use. CuSP provides a solution to the problem of existing slow partitioners that can only handle a few built-in policies by providing users with a general interface for specifying streaming partitioning policies that CuSP will then use to efficiently partition graphs. Our evaluation of the system shows that it can partition up to 22× faster than the state-of-the-art offline graph partitioner XtraPulp while producing partitions that allow graph applications to run 2× faster on average than XtraPulp's partitions. CuSP can be extended to allow users to express specific partitioning policies for their algorithms as we show in a case study with distributed triangle counting. 2. DeepGalois is a distributed graph neural network system that uses the observation that graph neural network computation can be expressed as a graph problem which allows it to be implemented in graph analytics systems. DeepGalois is built using existing distributed graph analytics systems; CuSP and the Gluon communication substrate are used to partition GNN graphs and efficiently communicate partial aggregations and gradients. It also supports sampling and minibatching of the graph. Experimental results on up to 128 CPU machines demonstrate that DeepGalois scales and that DeepGalois's epoch time for its best host configurations for the evaluated graphs is on average 2× faster than DistDGL's epoch time for its best host configurations. From an algorithmic perspective, I present a novel round-efficient distributed betweenness centrality and a novel formulation of the graph transformer network as a graph algorithm that allows for more efficient computation.
- Min Rounds Betweenness Centrality (MRBC) is a provably round efficient BC algorithm that uses a novel message update rule in order to only send out updates from a vertex if it knows that the data it is going to send it finalized. We prove the correctness of this rule as well as establish bounds on the maximum number of rounds the algorithm takes. We implement MRBC in the D-Galois distributed graph analytics system where we further reduce communication overhead by relaxing proxy update frequency based on the message send rule. Evaluation shows that compared to a classic Brandes BC algorithm implementation, it reduces the number of rounds by 14× and communication time by 2.8× on average.
- Graph Transformer Networks (GTNs) are a variant of graph convolutional networks that learn and use important typed paths called metapaths in a heterogeneous graph in order to improve task accuracy. The original formulation of the problem uses a series of dense matrix multiplies that are space inefficient, and the matrix formulation makes it difficult to use fine-grained graph techniques like sampling. We formulate the GTN problem as a graph problem that is more space efficient as it does not need dense matrices. In addition, because it is formulated as a graph algorithm, we can apply metapath sampling on top of it to significantly decrease the computational load. Evaluation shows that the sampling-based graph formulation of the GTN can be up to 155× faster than the original formulation without any compromise in task accuracy.