Improving distributed graph processing by load balancing and redundancy reduction




Song, Shuang, Ph. D.

Journal Title

Journal ISSN

Volume Title



The amount of data generated every day is growing exponentially in the big data era. A significant portion of this data is stored as graphs in various domains, such as online retail and social networks. Analyzing large-scale graphs provides important insights that are highly utilized in many areas, such as recommendation systems, banking systems, and medical diagnosis. To accommodate analysis on large-scale graphs, developers from industry and academia design the distributed graph processing systems.

However, processing graphs in a distributed manner suffers performance inefficiencies caused by workload imbalance and redundant computations. For instance, while data centers are trending towards a large amount of heterogeneous processing machines, graph partitioners still operate under the assumption of uniform computing resources. This leads to load imbalance that degrades the overall performance. Even with a balanced data distribution, the irregularity of graph applications can result in different amounts of dynamic operations on each machine in the cluster. Such imbalanced work distribution slows down the execution speed. Besides these, redundancy also impacts the performance of distributed graph analysis. To utilize the available parallelism of computing clusters, distributed graph systems deploy the repeated-relaxing computation model (e.g., Bellman-Ford algorithm variants) rather than in a sequential but work-optimal order. Studies performed in this dissertation show that redundant computations pervasively exist and significantly impact the performance efficiency of distributed graph processing.

This dissertation explores novel techniques to reduce the workload imbalance and redundant computations of analyzing large-scale graphs in a distributed setup. It evaluates proposed techniques on both pre-processing and execution modules to enable fair data distribution, lightweight workload balancing, and redundancy optimization for future distributed graph processing systems.

The first contribution of this dissertation is the Heterogeneity-aware Partitioning (HAP) that aims to balance load distribution of distributed graph processing in heterogeneous clusters. HAP proposes a number of methodologies to estimate various machines’ computational power on graph analytics. It also extends several state-of-the-art partitioning algorithms for heterogeneity-aware data distribution. Utilizing the estimation of machines’ graph processing capability to guide extended partitioning algorithms can reduce load imbalance when processing a large-scale graph in heterogeneous clusters. This results in significant performance improvement.

Another contribution of the dissertation is the Hula, which optimizes the workload balance of distributed graph analytics on the fly. Hula offers a hybrid graph partitioning algorithm to split a large-scale graph in a locality-friendly manner and generate metadata for lightweight dynamic workload balancing. To track machines’ work intensity, Hula inserts hardware timers to count the time spent on the important operations (e.g., computational operations and atomic operations). This information can guide Hula’s workload scheduler to arrange work migration. With the support of metadata generated by the hybrid partitioner, Hula’s migration scheme only requires a minimal amount of data to transfer work between machines in the cluster. Hula’s workload balancing design achieves a lightweight imbalance reduction on the fly.

Finally, this dissertation focuses on improving the computational efficiency of distributed graph processing. To do so, it reveals the root cause and the amount of redundant computations in distributed graph processing. SLFE is proposed as a system solution to reduce these redundant operations. SLFE develops a lightweight pre-processing technique to obtain the maximum propagation order of each vertex in a given graph. This information is defined as Redundancy Reduction Guidance (RRG) and is utilized by SLFE’s Redundancy Reduction (RR)-aware computing model to prune redundant operations on the fly. Moreover, SLFE provides RRaware APIs to maintain high promgrammablity. These techniques allow the redundancy optimizations of distributed graph processing to become transparent to users.


LCSH Subject Headings