Data Mesh: Delivering Data-Driven Value at Scale
- Length: 270 pages
- Edition: 1
- Language: English
- Publisher: O'Reilly Media
- Publication Date: 2022-02-15
- ISBN-10: 1492092398
- ISBN-13: 9781492092391
- Sales Rank: #281533 (See Top 100 Books)
Many enterprises are investing in a next-generation data lake, hoping to democratize data at scale to provide business insights and ultimately make automated intelligent decisions. In this practical book, author Zhamak Dehghani reveals that, despite the time, money, and effort poured into them, data warehouses and data lakes fail when applied at the scale and speed of today’s organizations. A distributed data mesh is a better choice.
Dehghani guides architects, technical leaders, and decision makers on their journey from monolithic big data architecture to a sociotechnical paradigm that draws from modern distributed architecture. A data mesh considers domains as a first-class concern, applies platform thinking to create self-serve data infrastructure, treats data as a product, and introduces a federated and computational model of data governance. This book shows you why and how.
- Examine the current data landscape from the perspective of business and organizational needs, environmental challenges, and existing architectures
- Analyze the landscape’s underlying characteristics and failure modes
- Get a complete introduction to data mesh principles and its constituents
- Learn how to design a data mesh architecture
- Move beyond a monolithic data lake to a distributed data mesh
Foreword Preface Why I Wrote This Book and Why Now Who Should Read This Book How to Read This Book Conventions Used in This Book O’Reilly Online Learning How to Contact Us Acknowledgments Prologue: Imagine Data Mesh Data Mesh in Action A Culture of Data Curiosity and Experimentation Data culture before data mesh An Embedded Partnership with Data and ML Data work before data mesh The Invisible Platform and Policies Limitless Scale with Autonomous Data Products The Positive Network Effect Why Transform to Data Mesh? The Way Forward I. What Is Data Mesh? 1. Data Mesh in a Nutshell The Outcomes The Shifts The Principles Principle of Domain Ownership Principle of Data as a Product Principle of the Self-Serve Data Platform Principle of Federated Computational Governance Interplay of the Principles Data Mesh Model at a Glance The Data Operational Data Analytical Data The Origin 2. Principle of Domain Ownership A Brief Background on Domain-Driven Design Applying DDD’s Strategic Design to Data Domain Data Archetypes Source-Aligned Domain Data Aggregate Domain Data Consumer-Aligned Domain Data Transition to Domain Ownership Push Data Ownership Upstream Define Multiple Connected Models Embrace the Most Relevant Domain Data: Don’t Expect a Single Source of Truth Hide the Data Pipelines as Domains’ Internal Implementation Recap 3. Principle of Data as a Product Applying Product Thinking to Data Baseline Usability Attributes of a Data Product Discoverable Addressable Understandable Trustworthy and truthful Natively accessible Interoperable Valuable on its own Secure Transition to Data as a Product Include Data Product Ownership in Domains Reframe the Nomenclature to Create Change Think of Data as a Product, Not a Mere Asset Establish a Trust-But-Verify Data Culture Join Data and Compute as One Logical Unit Recap 4. Principle of the Self-Serve Data Platform Data Mesh Platform: Compare and Contrast Serving Autonomous Domain-Oriented Teams Managing Autonomous and Interoperable Data Products A Continuous Platform of Operational and Analytical Capabilities Designed for a Generalist Majority Favoring Decentralized Technologies Domain Agnostic Data Mesh Platform Thinking Enable Autonomous Teams to Get Value from Data Enable data product developers Enable data product users Exchange Value with Autonomous and Interoperable Data Products Create higher-order value by composing data products Accelerate Exchange of Value by Lowering the Cognitive Load Abstract complexity through declarative modeling Abstract complexity through automation Scale Out Data Sharing Support a Culture of Embedded Innovation Transition to a Self-Serve Data Mesh Platform Design the APIs and Protocols First Prepare for Generalist Adoption Do an Inventory and Simplify Create Higher-Level APIs to Manage Data Products Build Experiences, Not Mechanisms Begin with the Simplest Foundation, Then Harvest to Evolve Recap 5. Principle of Federated Computational Governance Apply Systems Thinking to Data Mesh Governance Maintain Dynamic Equilibrium Between Domain Autonomy and Global Interoperability Introduce feedback loops Introduce leverage points Embrace Dynamic Topology as a Default State Utilize Automation and the Distributed Architecture Apply Federation to the Governance Model Federated Team Domain representatives Data platform representatives Subject matter experts Facilitators and managers Guiding Values Localize decisions and responsibility close to the source Identify cross-cutting concerns that need a global standard Globalize decisions that facilitate interoperability Identify consistent experiences that need a global standard Execute decisions locally Policies Local policies Global policies Incentives Introduce local incentives Introduce global incentives Apply Computation to the Governance Model Standards as Code Policies as Code Automated Tests Automated Monitoring Transition to Federated Computational Governance Delegate Accountability to Domains Embed Policy Execution in Each Data Product Automate Enablement and Monitoring over Interventions Model the Gaps Measure the Network Effect Embrace Change over Constancy Recap II. Why Data Mesh? 6. The Inflection Point Great Expectations of Data The Great Divide of Data Scale: Encounter of a New Kind Beyond Order Approaching the Plateau of Return Recap 7. After the Inflection Point Respond Gracefully to Change in a Complex Business Align Business, Tech, and Now Analytical Data Close the Gap Between Analytical and Operational Data Localize Data Changes to Business Domains Reduce Accidental Complexity of Pipelines and Copying Data Sustain Agility in the Face of Growth Remove Centralized and Monolithic Bottlenecks Reduce Coordination of Data Pipelines Reduce Coordination of Data Governance Enable Autonomy Increase the Ratio of Value from Data to Investment Abstract Technical Complexity with a Data Platform Embed Product Thinking Everywhere Go Beyond the Boundaries Recap 8. Before the Inflection Point Evolution of Analytical Data Architectures First Generation: Data Warehouse Architecture Second Generation: Data Lake Architecture Third Generation: Multimodal Cloud Architecture Characteristics of Analytical Data Architecture Monolithic Monolithic architecture Monolithic technology Monolithic organization The complicated monolith Centralized Data Ownership Technology Oriented Technically partitioned architecture Activity-oriented team decomposition Recap III. How to Design the Data Mesh Architecture 9. The Logical Architecture Domain-Oriented Analytical Data Sharing Interfaces Operational Interface Design Analytical Data Interface Design Interdomain Analytical Data Dependencies Data Product as an Architecture Quantum A Data Product’s Structural Components The code Data transformation as code Interfaces as code Policy as code The data and metadata The platform dependencies Data Product Data Sharing Interactions Input data ports Output data ports Data Discovery and Observability APIs The Multiplane Data Platform A Platform Plane Data Infrastructure (Utility) Plane Data Product Experience Plane Mesh Experience Plane Example Embedded Computational Policies Data Product Sidecar Policy execution Standardized protocols and interfaces Data Product Computational Container Control Port Configure policies Privileged operations Recap 10. The Multiplane Data Platform Architecture Design a Platform Driven by User Journeys Data Product Developer Journey Incept, Explore, Bootstrap, and Source Build, Test, Deploy, and Run Maintain, Evolve, and Retire Data Product Consumer Journey Incept, Explore, Bootstrap, Source Build, Test, Deploy, Run Maintain, Evolve, and Retire Recap IV. How to Design the Data Product Architecture 11. Design a Data Product by Affordances Data Product Affordances Data Product Architecture Characteristics Design Influenced by the Simplicity of Complex Adaptive Systems Emergent Behavior from Simple Local Rules No Central Orchestrator Recap 12. Design Consuming, Transforming, and Serving Data Serve Data The Needs of Data Users Serve Data Design Properties Multimodal data Immutable data Bitemporal data Impact of bitemporality Example States, events, or both Reduce the opportunity for retracted changes Read-only access Serve Data Design Consume Data Archetypes of Data Sources Collaborating operational systems as data sources Other data products as data sources Self as a data source Locality of Data Consumption Data Consumption Design Transform Data Programmatic Versus Nonprogrammatic Transformation Dataflow-Based Transformation ML as Transformation Time-Variant Transformation Transformation Design Recap 13. Design Discovering, Understanding, and Composing Data Discover, Understand, Trust, and Explore Begin Discovery with Self-Registration Discover the Global URI Understand Semantic and Syntax Models Establish Trust with Data Guarantees Explore the Shape of Data Learn with Documentation Discover, Explore, and Understand Design Compose Data Consume Data Design Properties Traditional Approaches to Data Composability Compose Data Design Recap 14. Design Managing, Governing, and Observing Data Manage the Life Cycle Manage Life-Cycle Design Data Product Manifest Components Govern Data Govern Data Design Standardize Policies Encryption Access control and identity Privacy and consent Data and Policy Integration Linking Policies Observe, Debug, and Audit Observability Design Observable outputs Traceability across operational and data planes Structured and standardized observability data Domain-oriented observability data Recap V. How to Get Started 15. Strategy and Execution Should You Adopt Data Mesh Today? Data Mesh as an Element of Data Strategy Data Mesh Execution Framework Business-Driven Execution Benefits of business-driven execution Challenges of business-driven execution Guidelines for business-driven execution Example of business-driven execution End-to-End and Iterative Execution Evolutionary Execution A multiphase evolution model Domain ownership evolution phases Data as a product evolution phases Self-serve platform evolution phases Federated computational governance evolution phases Guided evolution with fitness functions Domain ownership fitness functions Data as a product fitness functions Self-serve platform fitness functions Federated computational governance fitness functions Migration from legacy No centralized data architecture coexists with data mesh, unless in transition Centralized data technologies can be used with data mesh Bypass the lake and warehouse and go to directly to the source Use the data warehouse as the consuming edge node Migrate from a warehouse or lake in atomic evolutionary steps Recap 16. Organization and Culture Change Culture Values Analytical data is everyone’s responsibility Connect data across the boundaries to get value Delight data users Value the impact of data Build data products for change, durability, and independence Balance local data sharing with global interoperability Close the data collaboration gap with peer-to-peer data sharing Automate to increase data sharing speed and quality Reward Intrinsic Motivations Extrinsic Motivations Structure Organization Structure Assumptions Data Mesh Team Topologies Domain data product teams as stream-aligned teams Data platform teams as platform teams Federated governance teams as enabling teams Discover Data Product Boundaries Start with the existing business domains and subdomains Data products need long-term ownership Data products must have independent life cycles Data products are independently meaningful Data products boundary Goldilocks zone Data products without users don’t exist People Roles The data product owner role The domain data product developer The platform product owner Shifting role of the existing data governance office Changing the role of chief data and analytics officer Skillset Development Education drives democratization Flexible organizations require flexible people Process Key Process Changes Recap Index
Donate to keep this site alive
How to download source code?
1. Go to: https://www.oreilly.com/
2. Search the book title: Data Mesh: Delivering Data-Driven Value at Scale
, sometime you may not get the results, please search the main title
3. Click the book title in the search results
3. Publisher resources
section, click Download Example Code
.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.