Hewlett Packard Enterprise Data Science Institute

2016-06-11

Fast Distributed Algorithms for Connectivity and MST in Large Graphs

Motivated by the increasing need to understand the algorithmic foundations of distributed large-scale graph computations, we study a number of fundamental graph problems in a message-passing model for distributed computing where k ≥ 2 machines jointly perform computations on graphs with n nodes (typically, n gg k). The input graph is assumed to be initially randomly partitioned among the k machines, a common implementation in many real-world systems. Communication is point-to-point, and the goal is to minimize the number of communication rounds of the computation. Our main result is an (almost) optimal distributed randomized algorithm for graph connectivity. Our algorithm runs in ~O(n/k2) rounds (~O notation hides a polylog(n) factor and an additive polylog(n) term). This improves over the best previously known bound of ~O(n/k) [Klauck et al., SODA 2015], and is optimal (up to a polylogarithmic factor) in view of an existing lower bound of ~Ω(n/k2). Our improved algorithm uses a bunch of techniques, including linear graph sketching, that prove useful in the design of efficient distributed graph algorithms. We then present fast randomized algorithms for computing minimum spanning trees, (approximate) min-cuts, and for many graph verification problems. All these algorithms take ~O(n/k2) rounds, and are optimal up to polylogarithmic factors. We also show an almost matching lower bound of ~Ω(n/k2) for many graph verification problems using lower bounds in random-partition communication complexity.

SPAA '16: Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures.