Streaming Data Mesh: A Model for Optimizing Real-Time Data Services

Length: 224 pages
Edition: 1
Language: English
Publisher: O'Reilly Media
Publication Date: 2023-06-27
ISBN-10: 1098130723
ISBN-13: 9781098130725
Sales Rank: #194623 (See Top 100 Books)

Data lakes and warehouses have become increasingly fragile, costly, and difficult to maintain as data gets bigger and moves faster. Data meshes can help your organization decentralize data, giving ownership back to the engineers who produced it. This book provides a concise yet comprehensive overview of data mesh patterns for streaming and real-time data services.

Authors Hubert Dulay and Stephen Mooney examine the vast differences between streaming and batch data meshes. Data engineers, architects, data product owners, and those in DevOps and MLOps roles will learn steps for implementing a streaming data mesh, from defining a data domain to building a good data product. Through the course of the book, you’ll create a complete self-service data platform and devise a data governance system that enables your mesh to work seamlessly.

With this book, you will:

Design a streaming data mesh using Kafka
Learn how to identify a domain
Build your first data product using self-service tools
Apply data governance to the data products you create
Learn the differences between synchronous and asynchronous data services
Implement self-services that support decentralized data

Preface
    Who Should Read This Book
    Why We Wrote This Book
    Navigating This Book
    Conventions Used in This Book
    Using Code Examples
    O’Reilly Online Learning
    How to Contact Us
    Acknowledgments
        Hubert
        Stephen
1. Data Mesh Introduction
    Data Divide
    Data Mesh Pillars
        Data Ownership
        Data as a Product
        Federated Computational Data Governance
        Self-Service Data Platform
        Data Mesh Diagram
    Other Similar Architectural Patterns
        Data Fabric
        Data Gateways and Data Services
        Data Democratization
        Data Virtualization
    Focusing on Implementation
        Apache Kafka
        AsyncAPI
2. Streaming Data Mesh Introduction
    The Streaming Advantage
        Streaming Enables Real-Time Use Cases
        Streaming Enables Data Optimization Advantages
        Reverse ETL
    The Kappa Architecture
        Lambda Architecture Introduction
        Kappa Architecture Introduction
    Summary
3. Domain Ownership
    Identifying Domains
        Discernible Domains
        Geographic Regions
            Subdomains and subdata mesh
            Data sovereignty
        Hybrid Architecture
        Multicloud
            Disaster recovery
            Analytics
    Avoiding Ambiguous Domains
    Domain-Driven Design
        Domain Model
        Domain Logic
        Bounded Context
        The Ubiquitous Language
    Data Mesh Domain Roles
        Data Product Engineer
        Data Product Owner or Data Steward
    Streaming Data Mesh Tools and Platforms to Consider
    Domain Charge-Backs
    Summary
4. Streaming Data Products
    Defining Data Product Requirements
    Identifying Data Product Derivatives
        Derivatives from Other Domains
    Ingesting Data Product Derivatives with Kafka Connect
        Consumability
            Scalability
            Interoperability and data serialization
        Synchronous Data Sources
        Asynchronous Data Sources and Change Data Capture
        Debezium Connectors
    Transforming Data Derivatives to Data Products
        Data Standardization
        Protecting Sensitive Information
        SQL
            SaaS stream processor
            ksqlDB
                Provisioning connectors in ksqlDB
                User-defined functions in ksqlDB
        Extract, Transform, and Load
            Maintaining data warehouse concepts
            Data warehousing basics
            Dimensional versus fact data in a streaming context
            Materialized views in streams
            Streaming ETL with domain-driven design
    Publishing Data Products with AsyncAPI
        Registering the Streaming Data Product
        Building an AsyncAPI YAML Document
            Objects asyncapi, externalDocs, info, and tags
            Servers and security section
            Channels and topic section
            Components section
                Messages section
                Security schemes section
                Traits section
        Assigning Data Tags
            Quality
            Security
            Throughput
        Versioning
        Monitoring
    Summary
5. Federated Computational Data Governance
    Data Governance in a Streaming Data Mesh
        Data Lineage Graph
        Streaming Data Catalog to Organize Data Products
    Metadata
        Schemas
        Lineage
        Security
        Scalability
    Generating the Data Product Page from AsyncAPI
        Apicurio Registry
        Access Workflow
    Centralized Versus Decentralized
        Centralized Engineers
        Decentralized (Domain) Engineers
    Summary
6. Self-Service Data Infrastructure
    Streaming Data Mesh CLI
    Resource-Related Commands
        Cluster-Related Commands
        Topic-Related Commands
        The domain Commands
        The connect Commands
        The streaming Commands
            The udf command
            The sql command
        Publishing a Streaming Data Product
    Data Governance-Related Services
        Security Services
            Data obfuscation services
                Encryption
                Encryption and decryption UDFs
                Tokenization and detokenization UDFs
                Sensitive information detection
            Identity services
            Auditing
        Standards Services
        Lineage Services
    SaaS Services and APIs
    Summary
7. Architecting a Streaming Data Mesh
    Infrastructure
    Two Architecture Solutions
        Dedicated Infrastructure
            Producing domain architecture
            High-throughput producing domain
            Consuming domain architecture
                Real-time online analytical processing databases
                Consuming domains without a streaming platform
            Recommended architectures
        Multitenant Infrastructure
            Producing domain architecture
            Consuming domain architecture
            Regions
    Streaming Data Mesh Central Architecture
        The Domain Agent (aka Sidecar)
        Data Plane
        Control Plane
            The management plane and metadata and registry plane
            Self-service plane
                Workflow orchestration
                Implementing a DAG for linking
                Implementing a DAG for publishing data products
                Infrastructure as code (IaC)
    Summary
8. Building a Decentralized Data Team
    The Traditional Data Warehouse Structure
    Introducing the Decentralized Team Structure
        Empowering People
        Working Processes
        Fostering Collaboration
        Data-Driven Automation
    New Roles in Data Domains
        New Roles in the Data Plane
        New Roles in Data Science and Business Intelligence
9. Feature Stores
    Separating Data Engineering from Data Science
    Online and Offline Data Stores
    Apache Feast Introduction
    Summary
10. Streaming Data Mesh in Practice
    Streaming Data Mesh Example
    Deploying an On-Premises Streaming Data Mesh
        Installing a Connector
        Deploying Clickstream Connector and Auto-Creating Tables
            Deploy a Datagen connector
            Create the first few nodes
            Create a table-like structure in ksqlDB
        Deploying the Debezium Postgres CDC Connector
        Enrichment of Streaming Data
            Stream versus table
        Publishing the Data Product
    Consuming Streaming Data Products
    Fully Managed SaaS Services
    Summary and Considerations
Index

Data Modeling & Design Data Processing Data Warehousing Database Storage & Design Groupware Internet Intranets & Extranets Networking Telecommunications Web Design

Donate to keep this site alive

To access the Link, solve the captcha.

How to download source code?

1. Go to: https://www.oreilly.com/

2. Search the book title: Streaming Data Mesh: A Model for Optimizing Real-Time Data Services, sometime you may not get the results, please search the main title

3. Click the book title in the search results

3. Publisher resources section, click Download Example Code.

1. Disable the AdBlock plugin. Otherwise, you may not get any links.

2. Solve the CAPTCHA.

3. Click download link.

4. Lead to download server to download.

Streaming Data Mesh: A Model for Optimizing Real-Time Data Services

How to download source code?

Applying Artificial Intelligence in Cybersecurity Analytics and Cyber Threat Detection

Digital Signal Processing: Theory and Practice

Data Analytics with Spark Using Python

The SEO Guide: Insider secrets to rank #1 on Google

Advances in Digital Marketing in the Era of Artificial Intelligence: Case Studies and Data Analysis for Business Problem Solving

Wireless Communications