Browsing by Subject "Betweenness centrality"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Accelerating graph computation with system optimizations and algorithmic design(2021-08-06) Hoang, Loc Dac; Pingali, Keshav; Huang, Qixing; Rossbach, Christopher; Wu, BoMost data in today's world can be represented in a graph form, and these graphs can then be used as input to graph applications to derive useful information, such as shortest paths in a road network, similarity between drugs in a drug-protein network, persons of interest in a social network, or recommended products for customers in a customer purchase history graph. Graphs are growing larger as time passes, so there is an ever-growing need for efficient graph applications. Developers typically have two methods for accelerating the runtime of a graph application: (1) they optimize the systems on which the graph application is run on, and/or (2) they optimize the algorithm itself and gain speedup via algorithmic novelties. In this dissertation, I propose work that accelerates graph applications from these two perspectives. Broadly speaking, the work I present in this dissertation will be split into systems work and algorithmic work. On the systems end, I present the CuSP system and the DeepGalois system: 1. CuSP, or Customizable Streaming Partitioner, is a fast and general distributed graph partitioner that generates partitions for distributed graph analytics systems to use. CuSP provides a solution to the problem of existing slow partitioners that can only handle a few built-in policies by providing users with a general interface for specifying streaming partitioning policies that CuSP will then use to efficiently partition graphs. Our evaluation of the system shows that it can partition up to 22× faster than the state-of-the-art offline graph partitioner XtraPulp while producing partitions that allow graph applications to run 2× faster on average than XtraPulp's partitions. CuSP can be extended to allow users to express specific partitioning policies for their algorithms as we show in a case study with distributed triangle counting. 2. DeepGalois is a distributed graph neural network system that uses the observation that graph neural network computation can be expressed as a graph problem which allows it to be implemented in graph analytics systems. DeepGalois is built using existing distributed graph analytics systems; CuSP and the Gluon communication substrate are used to partition GNN graphs and efficiently communicate partial aggregations and gradients. It also supports sampling and minibatching of the graph. Experimental results on up to 128 CPU machines demonstrate that DeepGalois scales and that DeepGalois's epoch time for its best host configurations for the evaluated graphs is on average 2× faster than DistDGL's epoch time for its best host configurations. From an algorithmic perspective, I present a novel round-efficient distributed betweenness centrality and a novel formulation of the graph transformer network as a graph algorithm that allows for more efficient computation. 1. Min Rounds Betweenness Centrality (MRBC) is a provably round efficient BC algorithm that uses a novel message update rule in order to only send out updates from a vertex if it knows that the data it is going to send it finalized. We prove the correctness of this rule as well as establish bounds on the maximum number of rounds the algorithm takes. We implement MRBC in the D-Galois distributed graph analytics system where we further reduce communication overhead by relaxing proxy update frequency based on the message send rule. Evaluation shows that compared to a classic Brandes BC algorithm implementation, it reduces the number of rounds by 14× and communication time by 2.8× on average. 2. Graph Transformer Networks (GTNs) are a variant of graph convolutional networks that learn and use important typed paths called metapaths in a heterogeneous graph in order to improve task accuracy. The original formulation of the problem uses a series of dense matrix multiplies that are space inefficient, and the matrix formulation makes it difficult to use fine-grained graph techniques like sampling. We formulate the GTN problem as a graph problem that is more space efficient as it does not need dense matrices. In addition, because it is formulated as a graph algorithm, we can apply metapath sampling on top of it to significantly decrease the computational load. Evaluation shows that the sampling-based graph formulation of the GTN can be up to 155× faster than the original formulation without any compromise in task accuracy.Item Algorithms for dynamic and distributed networks : shortest paths, betweenness centrality and related problems(2018-01-24) Pontecorvi, Matteo; Ramachandran, Vijaya; Alvisi, Lorenzo; Nasre, Meghana; Plaxton, C. Greg; Zuckerman, DavidIn this thesis we study the problem of computing Betweenness Centrality in dynamic and distributed networks. Betweenness Centrality (BC) is a well-known measure for the relative importance of a node in a social network. It is widely used in applications such as understanding lethality in biological networks, identifying key actors in terrorist networks, supply chain management processes and more. The necessity of computing BC in large networks, especially when they quickly change their topology over time, motivates the study of dynamic algorithms that can perform faster than static ones. Moreover, the current techniques for computing BC requires a deeper understanding of a classic problem in computer science: computing all pairs all shortest paths (APASP) in a graph. One of the main contributions of this thesis is a collection of dynamic algorithms for computing APASP and BC scores which are provably faster than static algorithms for several classes of graphs. We use n = |V| and m = |E| to indicate respectively the number of nodes and edges in a directed positively weighted graph G = (V,E). Our bounds depend on the parameter [nu]* that is defined as the maximum number of edges that lie on shortest paths through any single vertex. The main results in the first part of this thesis are listed below. - A decrease-only algorithm for computing BC and APASP running in time O([nu]* n) that is provably faster than recomputing from scratch in sparse graphs. - An increase-only algorithm for computing BC and APASP that runs in O([nu]*² log n) per update for a sequence of at least [Omega](m*/[nu]*) updates. Here m* is the number of edges in G that lie on shortest paths. This algorithm uses O(m* [nu]*) space. - An increase-only algorithm for computing BC and APASP that runs in O([nu]*² log n) but improves the computational space to O(m*n). - A fully dynamic algorithm for computing BC and APASP that runs in O([nu]*² log³ n) amortized time per update for a sequence of at least [Omega](n) updates. - A refinement of our fully dynamic algorithm that improves the amortized running time to O([nu]*² log² n), saving a logarithmic factor. In the second part of this thesis, we study the case when the input graph is a distributed network of machines and the BC score of each machine, considering its location within the network topology, needs to be computed. In this scenario, each node in the input graph is a self-contained machine with limited knowledge of the network and communication power. Each machine only knows the (virtual) location of the neighbors machines (adjacent nodes in the input graph). The messages, exchanged in each round between machines, cannot exceed a bounded size of at most O(log n) bits. In this distributed model, called CONGEST, we present algorithms for computing BC in near-optimal time for unweighted networks, and some classes of weighted networks. Specifically, our main results are: - A distributed BC algorithm for unweighted undirected graphs completing in at most min(2n+O(D[underscore u]); 4n) rounds, where D[underscore u] is the diameter of the undirected network. - A distributed BC algorithm for unweighted directed graphs completing in at most min(2n + O(D); 4n) rounds, where D is the diameter of the directed network. - A distributed APSP algorithm for unweighted directed graphs completing in at most min(n + O(D); 2n) rounds. - A distributed BC algorithm for weighted directed acyclic graphs (dag) completing in at most 2n + O(L) rounds, where L is the longest length of a path in the dag. - A distributed APSP algorithm for weighted dags completing in at most n + O(L) rounds.