Massive Graph Analytics
- Length: 544 pages
- Edition: 1
- Language: English
- Publisher: Chapman and Hall/CRC
- Publication Date: 2022-03-01
- ISBN-10: 0367464128
- ISBN-13: 9780367464127
- Sales Rank: #7686644 (See Top 100 Books)
Expertise in massive scale graph analytics is key for solving real-world grand challenges from health to sustainability to detecting insider threats, cyber defense, and more. Massive Graph Analytics provides a comprehensive introduction to massive graph analytics, featuring contributions from thought leaders across academia, industry, and government.
The book will be beneficial to students, researchers and practitioners, in academia, national laboratories, and industry, who wish to learn about the state-of-the-art algorithms, models, frameworks, and software in massive scale graph analytics.
Cover Half Title Series Page Title Page Copyright Page Table of Contents Editor Contributors Introduction SECTION I: Algorithms: Search and Paths Chapter 1 A Work-Efficient Parallel Breadth-First Search Algorithm (or How To Cope With the Nondeterminism of Reducers) 1.1 Introduction 1.1.1 Outline 1.2 Background on Dynamic Multithreading 1.2.1 Multithreaded Pseudocode 1.2.2 Reducers 1.3 The PBFS Algorithm 1.4 The Bag Data Structure 1.4.1 Pennants 1.4.2 Bags 1.4.3 Optimization 1.5 Experimental Results 1.5.1 Implementation and Testing 1.5.2 Results 1.6 Background on the Dag Model 1.6.1 The Dag Model 1.6.2 Determinacy 1.6.3 Work and Span 1.6.4 Scheduling 1.7 Modeling Reducers 1.8 Analysis of Programs with Nonconstant-Time Reducers 1.8.1 Scheduling with Reducers 1.8.2 Analyzing a Performance Dag 1.8.3 Relating the Performance and User Dags 1.9 Analyzing PBFS 1.10 Conclusion Acknowledgments References Chapter 2 Multi-Objective Shortest Paths 2.1 Introduction 2.2 Preliminaries 2.3 More Related Work 2.4 The Basic Algorithm 2.4.1 Example 2.4.2 Algorithm Details 2.5 A Bicriteria Instantiation – Theory 2.5.1 Node Local Operations 2.5.2 The Pareto Queue Q 2.5.3 Additional Improvements 2.6 A Practical Bicriteria Implementation 2.6.1 A Pareto Queue Based on Weight-Balanced B-trees 2.6.2 Further Practical Improvements 2.7 Experimental Evaluation 2.7.1 Pareto Queue Evaluation 2.7.2 Parallel Bi-Objective Search 2.7.3 Evaluation of Parallel Bi-Objective Search 2.8 Single Target Search 2.8.1 Number of Steps of Goal Directed Search 2.9 More Objectives 2.10 Conclusion Acknowledgments References SECTION II: Algorithms: Structure Chapter 3 Multicore Algorithms for Graph Connectivity Problems 3.1 Introduction 3.1.1 Contributions 3.2 Background 3.2.1 Strongly Connected Components 3.2.1.1 Serial Algorithms 3.2.1.2 Forward-Backward 3.2.1.3 Coloring 3.2.1.4 Other Parallel SCC Approaches 3.2.2 Connected and Weakly Connected Components 3.2.3 Biconnected Components 3.3 Multistep Method For SCC Decomposition 3.3.1 Observations 3.3.2 Description of Method 3.3.3 Work Complexity 3.4 Applying the Multistep Method 3.4.1 Trim Step 3.4.2 Breadth-First Search 3.4.3 Coloring 3.4.4 Serial Step 3.4.5 Connected Components and Weakly Connected Components 3.4.6 Biconnected Components 3.5 Experimental Setup 3.6 Experimental Results 3.6.1 Strongly Connected Component Decomposition 3.6.2 Connected Components Decomposition 3.6.3 Biconnected Component Decomposition 3.7 Conclusions References Chapter 4 Distributed Memory Parallel Algorithms for Massive Graphs 4.1 Introduction 4.2 Generating Random Graphs 4.2.1 Erdös–Rényi Model 4.2.1.1 Efficient Sequential Algorithm 4.2.1.2 Distributed Memory Parallel Algorithm 4.2.2 Preferential Attachment Model 4.2.3 Switching Edges of a Graph and Generating Random Graphs with Given Degree Sequences 4.3 Counting Triangles 4.3.1 An Algorithm using MPI 4.3.2 Algorithms using MapReduce 4.4 Community Detection 4.5 Giraph: A Distributed Graph Processing Framework 4.5.1 Parallel Algorithm for the Single Source Shortest Path Problem using Giraph 4.6 Breadth-First Search References Chapter 5 Efficient Multi-core Algorithms for Computing Spanning Forests and Connected Components 5.1 Introduction 5.2 Previous Work 5.3 Union-Find Algorithms 5.4 Parallelizing Rem’s Algorithm 5.4.1 Using Locks 5.4.2 A New Lock-Free Approach 5.4.3 A Local Algorithm 5.5 Experiments 5.6 Concluding Remarks Acknowledgments References Chapter 6 Massive-Scale Distributed Triangle Computation and Applications 6.1 Introduction 6.1.1 Preliminary Definitions and Terminology 6.1.2 Motivation for Enumerating Triangles 6.1.3 System and Algorithm Design Criteria 6.2 Algorithmic Primer for Exact Computation 6.2.1 IEEE HPEC Graph Challenge 6.2.2 Classical Graph Algorithms 6.2.3 Linear-Algebraic Techniques 6.2.3.1 Sparse Linear Algebra 6.2.3.2 Dense Linear Algebra 6.3 Distributed Asynchronous Triangle Enumeration 6.3.1 An Asynchronous Distributed Graph Framework 6.3.1.1 Communication 6.3.1.2 Effects of Aggregation and Routing 6.3.2 Triangle Counting at Massive Scale 6.3.2.1 1.5M CPUs on IBM BG/Q Sequoia 6.3.2.2 Scaling on a Linux Cluster 6.3.3 Asynchronous Distributed κ-Truss Decomposition 6.3.4 Avoiding Communication 6.4 Approximate Triangle Counting 6.4.1 Streaming Graph Algorithms 6.4.2 A Historical Survey of Approximate Triangle Counting 6.4.2.1 Sparsification Algorithms 6.4.2.2 Global Approximation Algorithms 6.4.2.3 Vertex-Local Approximation Algorithms 6.4.3 Graph Sketching 6.4.3.1 Triangle Count Estimation via Sketches 6.4.4 Discussion of Sampling Versus Sketching References SECTION III: Algorithms and Applications Chapter 7 Computing Top-k Closeness Centrality in Fully Dynamic Graphs 7.1 Introduction 7.2 Preliminaries 7.2.1 Notation and Problem Definition 7.2.2 Related Work 7.2.3 Static Top-k Closeness 7.3 Dynamic Top-k Closeness Centrality 7.3.1 Updating the Number of Reachable Vertices 7.3.2 Finding Affected Vertices 7.3.3 Update After an Edge Insertion 7.3.4 Update After an Edge Removal 7.3.5 Running Times and Memory Requirements 7.4 Experiments 7.4.1 Experimental Setup 7.4.2 Speedups on Recomputation 7.5 Conclusions Acknowledgments 7.6 Overview of Networks 7.7 Additional Experimental Results 7.7.1 Speedups for Complex Networks 7.7.2 Speedups for Road Networks 7.7.3 Running Times for Complex Networks 7.7.4 Running Times for Road Networks References Chapter 8 Ordering Heuristics for Parallel Graph Coloring 8.1 Introduction 8.1.1 Ordering Heuristics 8.1.2 Parallel Greedy Coloring 8.1.3 Outline 8.2 The Jones-Plassmann Algorithm 8.2.1 The Dag Model of Dynamic Multithreading 8.2.2 Analysis of JP 8.3 JP with Random Ordering 8.4 The LF and SL Heuristics 8.4.1 The LF Ordering Heuristic 8.4.2 The SL Ordering Heuristic 8.5 Log Ordering Heuristics 8.5.1 The LLF Ordering Heuristic 8.5.2 The SLL Ordering Heuristic 8.6 Empirical Evaluation 8.6.1 Experimental Setup 8.6.2 Coloring Quality of R, LLF, and SLL 8.6.3 Scalability of JP-R, JP-LLF, and JP-SLL 8.7 Implementation Techniques 8.7.1 Join Trees for Reducing Memory Contention 8.7.2 Bit Vectors for Assigning Colors 8.7.3 Software Prefetching 8.8 The SD Heuristic 8.9 Related Work 8.10 Conclusion Acknowledgments 8.11 Appendix: Performance of Serial Ordering Heuristics References Chapter 9 Partitioning Trillion-Edge Graphs 9.1 Introduction 9.2 Background 9.2.1 Graph Partitioning 9.2.2 Related Work 9.3 XTRAPULP 9.3.1 XTRAPULP Overview 9.3.1.1 Graph Representation 9.3.1.2 XTRAPULP Algorithm 9.3.2 XTRAPULP Initialization 9.3.2.1 ExchangeUpdates 9.3.3 XTRAPULP Vertex-Balancing Phase 9.3.4 XTRAPULP Refinement Phase 9.3.5 XTRAPULP Edge-Balancing Phase 9.4 Experimental Setup 9.5 Results 9.5.1 Performance and Scalability 9.5.1.1 Scaling on Blue Waters 9.5.1.2 Trillion-Edge Runs 9.5.1.3 Scaling on Compton 9.5.2 Partitioning Quality 9.6 Conclusions References Chapter 10 New Phenomena in Large-Scale Internet Traffic 10.1 Introduction 10.2 Methodology 10.2.1 MAWI and CAIDA Internet Traffic Collection 10.2.2 Network Quantities from Matrices 10.2.3 Memory and Computation Requirements 10.3 Internet Traffic Modeling 10.3.1 Logarithmic Pooling 10.3.2 Modified Zipf–Mandelbrot Model 10.3.3 Nonlinear Model Fitting 10.4 Results 10.5 Discussion 10.6 Conclusions Acknowledgments References Chapter 11 Parallel Algorithms for Butterfly Computations 11.1 Introduction 11.2 Preliminaries 11.3 PARBUTTERFLY Framework 11.3.1 Counting Framework 11.3.1.1 Ranking 11.3.1.2 Wedge Aggregation 11.3.1.3 Butterfly Aggregation 11.3.1.4 Other Options 11.3.2 Peeling Framework 11.4 PARBUTTERFLY Algorithms 11.4.1 Preprocessing 11.4.2 Counting Algorithms 11.4.2.1 Wedge Retrieval 11.4.2.2 Per Vertex 11.4.2.3 Per Edge 11.4.3 Peeling Algorithms 11.4.3.1 Per Vertex 11.4.3.2 Per Edge 11.4.3.3 Per Vertex/Edge Storing All Wedges 11.4.4 Approximate Counting 11.4.5 Approximate Degree Ordering 11.4.6 Complement Degeneracy Ordering 11.5 Parallel Fibonacci Heap 11.5.1 Batch Insertion 11.5.2 Delete-min 11.5.3 Batch Decrease-Key 11.5.4 Application to Bucketing 11.5.4.1 Retrieving the Minimum Bucket 11.5.4.2 Updating the Bucketing Structure 11.6 Experiments 11.6.1 Environment 11.6.2 Results 11.6.2.1 Butterfly Counting 11.6.2.2 Ranking 11.6.2.3 Approximate Counting 11.6.2.4 Butterfly Peeling 11.6.3 Cache Optimization for Butterfly Counting 11.6.4 Butterfly Counting 11.6.4.1 Ranking 11.6.4.2 Approximate Counting 11.7 Related Work 11.8 Conclusion Acknowledgments References SECTION IV: Models Chapter 12 Recent Advances in Scalable Network Generation 12.1 Introduction 12.2 Graph Properties and Uses of Generators 12.2.1 Graph Properties 12.2.2 Use Cases 12.3 General Algorithmic Models and Techniques 12.3.1 Models of Computation 12.3.2 Random Permutations 12.3.3 Basic Sampling Techniques 12.3.3.1 Bernoulli Sampling 12.3.3.2 Sampling k Elements from [N] 12.3.3.3 Rejection Sampling 12.3.3.4 Weighted Sampling 12.3.4 Sampling from Huge Sets 12.4 Basic Models 12.4.1 Erdös-Rényi’s G(n,m) and Gilbert’s G(n, p) models 12.4.2 Preferential Attachment 12.4.2.1 Barabási-Albert Model 12.4.2.2 Node Copy Model 12.5 Random Spatial Graphs 12.5.1 Random Geometric Graphs 12.5.2 Random Hyperbolic Graphs 12.5.2.1 Threshold Model 12.5.2.2 Binomial Model 12.5.3 Geometric Inhomogenous Random Graphs 12.5.4 Random Planar Graphs 12.5.4.1 Random Delaunay Triangulations 12.5.4.2 Planar Graphs for Infrastructures 12.6 Random Graphs with Prescribed Degree Sequences 12.6.1 Chung-Lu 12.6.2 Configuration Model 12.6.3 Edge Switching 12.6.4 Curveball and Global Curveball 12.7 Block Models 12.7.1 Stochastic Block Model 12.7.2 R-MAT / Kronecker Graphs 12.7.3 LFR 12.7.4 CKB 12.8 Graph Replication 12.8.1 BTER 12.8.2 Darwini 12.8.3 Multilevel Generators 12.8.4 dK-Graphs 12.9 Additional Graph Types 12.9.1 Directed Graphs 12.9.2 Weighted Graphs 12.9.3 Connected Graphs 12.9.4 Regular Graphs 12.9.5 Threshold Graphs 12.10 Software Packages 12.11 Future Challenges Acknowledgments References Chapter 13 Computational Models for Cascades in Massive Graphs: How to Spread a Rumor in Parallel 13.1 Introduction 13.2 Models for Rumor Diffusion and Epidemics 13.3 Cascade Dynamics in a Network 13.3.1 Independent Cascade model 13.3.2 Threshold Models 13.3.3 Complex Contagion Model 13.4 A Unified Model for Rumor Cascades 13.4.1 Analytical Model of Influence 13.4.2 Unified Model of Influence 13.4.3 Infection Probability Formula Under the Unified Model 13.4.3.1 Complexity Analysis 13.4.4 Reduction to Other Models 13.4.4.1 Independent Cascade Model 13.4.4.2 Threshold Models 13.4.4.3 Complex Contagion Model 13.4.5 Empirical Evaluation of the Unified Model 13.5 Parallel Temporal Rumor Cascade Analysis in Massive Graphs 13.6 Conclusions Acknowledgment References Chapter 14 Executing Dynamic Data-Graph Computations Deterministically Using Chromatic Scheduling 14.1 Introduction 14.1.1 A Serial Reference Implementation 14.1.2 Parallelizing Dynamic Data-Graph Computations 14.1.3 Contributions 14.1.4 Outline 14.2 Background 14.2.1 The Dag Model of Multithreading 14.2.2 Work-Span Analysis 14.2.3 Work-Stealing Runtime Systems 14.2.4 Worker-Local Storage 14.3 The PRISM Algorithm 14.3.1 Design Considerations for the Implementation of Multibags 14.4 The Multibag Data Structure 14.4.1 Implementation of MB-INSERT and MB-COLLECT 14.4.2 Analysis of Multibags 14.5 Analysis of PRISM 14.5.1 Theoretical Scalability of PRISM 14.6 Empirical evaluation 14.6.1 Experimental Setup 14.6.2 Overheads for Locking and for Chromatic Scheduling 14.6.3 Dynamic Scheduling Overhead 14.6.4 Scalability of PRISM 14.7 The Prism-R Algorithm 14.7.1 Data-Graph Computations with Global Reductions 14.7.2 PRISM-R 14.8 The multivector data structure 14.8.1 The Log-Tree Reducer 14.8.2 Analysis of Multivector Operations 14.9 Analysis and evaluation of Prism-R 14.9.1 Work-Span Analysis of Prism-R 14.9.2 Empirical Evaluation of PRISM-R 14.10 Conclusion 14.11 Acknowledgments References SECTION V: Frameworks and Software Chapter 15 Graph Data Science Using Neo4j 15.1 Introduction 15.1.1 What is Graph Data Science? 15.1.2 Types of Questions for GDS 15.1.3 The Rise of GDS 15.2 Neo4j for GDS 15.2.1 Neo4j Database and Labeled Property Graph 15.2.2 Cypher Declarative Query Language 15.2.3 Neo4j GDS Library 15.2.4 Neo4j Browser and Neo4j Bloom 15.3 The GDS Journey 15.3.1 Knowledge Graphs 15.3.2 Graph Analytics 15.3.3 Graph Feature Engineering 15.3.4 Graph Embedding 15.3.5 Graph Networks 15.4 Real-World Uses of GDS 15.4.1 Prominent Commercial Uses 15.4.1.1 Drug Discovery and Patient Journeys 15.4.1.2 Recommendations and Targeted Marketing 15.4.1.3 Fraud Detection 15.5 Fraud Example 15.5.1 Query Dataset 15.5.2 Outlier Removal 15.5.3 Finding Suspicious Clusters 15.5.4 Visually Exploring a Suspicious Cluster 15.5.5 Predicting Fraudsters Using Graph Features 15.6 Conclusion References Chapter 16 The Parallel Boost Graph Library 2.0: Active Messages as a Spanning Model for Parallel Graph Computation 16.1 Introduction 16.2 Related 16.2.1 Active Messages 16.2.2 Parallel Graph Libraries 16.2.3 Active Messages for Graph Algorithms 16.3 Programming Model 16.3.1 Background: Active Pebbles 16.4 Design of a Graph Application Stack 16.4.1 Distributed Data Model 16.4.2 Runtime Communication Transformations 16.4.3 Separating Expression from Execution 16.4.3.1 Message Configuration 16.5 Experimental Evaluation 16.5.1 Implementation Details 16.5.2 Results 16.5.2.1 Breadth-First Search 16.5.2.2 Δ-Stepping Shortest Paths 16.5.2.3 Shiloach-Vishkin Connected Components 16.5.2.4 PageRank 16.6 Conclusion References Chapter 17 RAPIDS cuGraph 17.1 Introduction 17.2 Data Science and Workflow Integration 17.2.1 Ecosystem 17.2.2 Workflow Integration 17.3 Technology Stack and Software 17.3.1 Transformations and Renumbering 17.3.2 Stack 17.4 Performance results 17.4.1 Single GPU Performance 17.4.2 Multiple GPU Performance 17.5 Conclusion Acknowledgment References Chapter 18 A Cloud-Based Approach to Big Graphs 18.1 Introduction 18.2 Background 18.2.1 Graph definition 18.2.2 Memory Latency Challenges 18.2.3 Scalability Challenges 18.2.4 Fault-Tolerance 18.3 Cloud-Based Approach 18.4 Related Work 18.5 A Few Words on MapReduce 18.6 Graph Representation 18.6.1 Edge list 18.6.2 Accumulo Edge Table 18.7 Breadth-First Search 18.7.1 Cloud-Based BFS 18.7.2 Accumulo Adjacency 18.7.3 Key-Space Partitioning 18.7.4 Reduce Task Count 18.8 Results and Discussion 18.8.1 Graph500 Benchmark 18.8.2 Experiments 18.8.3 Results 18.9 Conclusion References Chapter 19 Introduction to GraphBLAS 19.1 Adjacency Matrix 19.2 Incidence Matrix 19.3 Matrix Values 19.4 Scalar Operations 19.5 Matrix Properties 19.6 0-Element: No Graph Edge 19.7 Matrix Graph Operations 19.7.1 Building a Matrix: Edge List to Graph 19.7.2 Extracting Tuples: Graph to Vertex List 19.7.3 Transpose: Swap Out-Vertices and In-Vertices 19.7.4 Matrix Multiplication: Breadth-First-Search, and Adjacency Matrix Construction 19.7.5 Extract: Selecting Sub-graphs 19.7.6 Assign: Modifying Sub-Graphs 19.7.7 Element-Wise Addition and Element-Wise Multiplication: Combining Graphs, Intersecting Graphs, and Scaling Graphs 19.8 Performance 19.9 Conclusions References Chapter 20 Graphulo: Linear Algebra Graph Kernels 20.1 Introduction 20.1.1 The Graphulo Initiative 20.1.2 Outline 20.2 Associative Arrays and Graph Schemas 20.2.1 Associative Arrays 20.2.2 Graph Schemas 20.2.2.1 Adjacency Matrix 20.2.2.2 Incidence Matrix 20.2.2.3 D4M Schema 20.3 Algorithms 20.3.1 Centrality 20.3.2 Subgraph Detection and Vertex Nomination 20.3.3 Similarity 20.3.4 Community Detection 20.3.5 Graph Algorithms and Neural Networks 20.4 Graphulo Implementation 20.4.1 Graphulo 20.4.2 D4M 20.5 Discussion 20.6 Conclusions Acknowledgments References Chapter 21 Interactive Graph Analytics at Scale in Arkouda 21.1 Introduction 21.2 Arkouda Framework for Data Science 21.3 Succinct Double-Index Data Structure 21.3.1 Edge Index and Vertex Index 21.3.2 Time and Space Complexity Analysis 21.3.3 Comparison with CSR 21.4 Multilocale Breadth-First Search and Triangle Counting Algorithms 21.4.1 Parallel BFS Algorithm 21.4.1.1 High-Level Multilocale BFS Algorithm 21.4.1.2 Low-level Multilocale BFS Algorithm 21.4.2 Parallel Triangle Counting Algorithm 21.5 Graph Analytics Workflow 21.5.1 Edge Oriented Sparse Graph Partitioning and Building 21.5.2 Sliding Window Stream based Sketch Building 21.5.3 Real-World Graph Distributions based Regression Model 21.6 Integration with Arkouda 21.6.1 Graph Classes Definition in Python and Chapel 21.6.2 BFS and Triangle Counting Benchmark 21.7 Experiments 21.7.1 Experimental Setup for BFS Analysis 21.7.2 Experimental Results of BFS Algorithm 21.7.2.1 Graph Building 21.7.2.2 BFS Performance 21.7.3 Experimental Setup for Triangle Counting 21.7.4 Experimental Results of Triangle Counting 21.7.4.1 Accuracy 21.7.4.2 Performance 21.8 Related Work 21.8.1 BFS Algorithm 21.8.2 Triangle Counting Algorithm 21.8.3 Graph Stream Sketch 21.8.4 Complete Graph Stream Processing Method 21.9 Conclusion Acknowledgments References Index Blank Page
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.