Mastering Kafka Streams and ksqlDB: Building Real-Time Data Systems by Example
- Length: 432 pages
- Edition: 1
- Language: English
- Publisher: O'Reilly Media
- Publication Date: 2021-03-23
- ISBN-10: 1492062499
- ISBN-13: 9781492062493
- Sales Rank: #0 (See Top 100 Books)
Working with unbounded and fast-moving data streams has historically been difficult. But with Kafka Streams and ksqlDB, building stream processing applications is easy and fun. This practical guide explores the world of real-time data systems through the lens of these popular technologies and explains important stream processing concepts against a backdrop of interesting business problems.
Mitch Seymour, senior data systems engineer at Mailchimp, introduces you to both Kafka Streams and ksqlDB so that you can choose the best tool for each unique stream processing project. Non-Java developers will find the ksqlDB path to be an especially gentle introduction to stream processing. In this book, you’ll learn:
- Basic and advanced uses of Kafka Streams and ksqlDB
- How to transform, enrich, and process event streams
- How to build both stateless and stateful stream processing applications
- The different notions of time and the role it plays in stream processing
- How to to build event-driven microservices on top of continuous event streams
- Features, operational characteristics, deployment patterns, and configuration tips for both technologies
Foreword Preface Who Should Read This Book Navigating This Book Source Code Kafka Streams Version ksqlDB Version Conventions Used in This Book Using Code Examples O’Reilly Online Learning How to Contact Us Acknowledgments I. Kafka 1. A Rapid Introduction to Kafka Communication Model How Are Streams Stored? Topics and Partitions Events Kafka Cluster and Brokers Consumer Groups Installing Kafka Hello, Kafka Summary II. Kafka Streams 2. Getting Started with Kafka Streams The Kafka Ecosystem Before Kafka Streams Enter Kafka Streams Features at a Glance Operational Characteristics Scalability Reliability Maintainability Comparison to Other Systems Deployment Model Processing Model Kappa Architecture Use Cases Processor Topologies Sub-Topologies Depth-First Processing Benefits of Dataflow Programming Tasks and Stream Threads High-Level DSL Versus Low-Level Processor API Introducing Our Tutorial: Hello, Streams Project Setup Creating a New Project Adding the Kafka Streams Dependency DSL Processor API Streams and Tables Stream/Table Duality KStream, KTable, GlobalKTable Summary 3. Stateless Processing Stateless Versus Stateful Processing Introducing Our Tutorial: Processing a Twitter Stream Project Setup Adding a KStream Source Processor Serialization/Deserialization Building a Custom Serdes Defining Data Classes Implementing a Custom Deserializer Implementing a Custom Serializer Building the Tweet Serdes Filtering Data Branching Data Translating Tweets Merging Streams Enriching Tweets Avro Data Class Sentiment Analysis Serializing Avro Data Registryless Avro Serdes Schema Registry–Aware Avro Serdes Adding a Sink Processor Running the Code Empirical Verification Summary 4. Stateful Processing Benefits of Stateful Processing Preview of Stateful Operators State Stores Common Characteristics Embedded Multiple access modes Fault tolerant Key-based Persistent Versus In-Memory Stores Introducing Our Tutorial: Video Game Leaderboard Project Setup Data Models Adding the Source Processors KStream KTable GlobalKTable Registering Streams and Tables Joins Join Operators Join Types Co-Partitioning Value Joiners KStream to KTable Join (players Join) KStream to GlobalKTable Join (products Join) Grouping Records Grouping Streams Grouping Tables Aggregations Aggregating Streams Initializer Adder Aggregating Tables Subtractor Putting It All Together Interactive Queries Materialized Stores Accessing Read-Only State Stores Querying Nonwindowed Key-Value Stores Point lookups Range scans All entries Number of entries Local Queries Remote Queries Summary 5. Windows and Time Introducing Our Tutorial: Patient Monitoring Application Project Setup Data Models Time Semantics Timestamp Extractors Included Timestamp Extractors Custom Timestamp Extractors Registering Streams with a Timestamp Extractor Windowing Streams Window Types Tumbling windows Hopping windows Session windows Sliding join windows Sliding aggregation windows Selecting a Window Windowed Aggregation Emitting Window Results Grace Period Suppression Filtering and Rekeying Windowed KTables Windowed Joins Time-Driven Dataflow Alerts Sink Querying Windowed Key-Value Stores Key + window range scans Window range scans All entries Summary 6. Advanced State Management Persistent Store Disk Layout Fault Tolerance Changelog Topics Standby Replicas Rebalancing: Enemy of the State (Store) Preventing State Migration Sticky Assignment Static Membership Reducing the Impact of Rebalances Incremental Cooperative Rebalancing Controlling State Size Tombstones Window retention Aggressive topic compaction Fixed-size LRU cache Deduplicating Writes with Record Caches State Store Monitoring Adding State Listeners Adding State Restore Listeners Built-in Metrics Interactive Queries Custom State Stores Summary 7. Processor API When to Use the Processor API Introducing Our Tutorial: IoT Digital Twin Service Project Setup Data Models Adding Source Processors Adding Stateless Stream Processors Creating Stateless Processors Creating Stateful Processors Periodic Functions with Punctuate Accessing Record Metadata Adding Sink Processors Interactive Queries Putting It All Together Combining the Processor API with the DSL Processors and Transformers Putting It All Together: Refactor Summary III. ksqlDB 8. Getting Started with ksqlDB What Is ksqlDB? When to Use ksqlDB Evolution of a New Kind of Database Kafka Streams Integration Connect Integration How Does ksqlDB Compare to a Traditional SQL Database? Similarities Differences Architecture ksqlDB Server SQL engine REST service ksqlDB Clients ksqlDB CLI ksqlDB UI Deployment Modes Interactive Mode Headless Mode Tutorial Installing ksqlDB Running a ksqlDB Server Precreating Topics Using the ksqlDB CLI Summary 9. Data Integration with ksqlDB Kafka Connect Overview External Versus Embedded Connect External Mode Embedded Mode Configuring Connect Workers Converters and Serialization Formats Tutorial Installing Connectors Creating Connectors with ksqlDB Showing Connectors Describing Connectors Dropping Connectors Verifying the Source Connector Interacting with the Kafka Connect Cluster Directly Introspecting Managed Schemas Summary 10. Stream Processing Basics with ksqlDB Tutorial: Monitoring Changes at Netflix Project Setup Source Topics Data Types Custom Types Collections Creating Source Collections With Clause Working with Streams and Tables Showing Streams and Tables Describing Streams and Tables Altering Streams and Tables Dropping Streams and Tables Basic Queries Insert Values Simple Selects (Transient Push Queries) Projection Filtering Wildcards Logical operators Between (range filter) Flattening/Unnesting Complex Structures Conditional Expressions Coalesce IFNULL Case Statements Writing Results Back to Kafka (Persistent Queries) Creating Derived Collections Showing queries Explaining queries Terminating queries Putting It All Together Summary 11. Intermediate and Advanced Stream Processing with ksqlDB Project Setup Bootstrapping an Environment from a SQL File Data Enrichment Joins Casting a column to a new type Repartitioning data Persistent joins Windowed Joins Aggregations Aggregation Basics Windowed Aggregations Delayed data Window retention Materialized Views Clients Pull Queries Curl Push Queries Push Queries via Curl Functions and Operators Operators Showing Functions Describing Functions Creating Custom Functions Stop-word removal UDF Additional Resources for Custom ksqlDB Functions Summary IV. The Road to Production 12. Testing, Monitoring, and Deployment Testing Testing ksqlDB Queries Testing Kafka Streams Unit tests DSL Processor API Behavioral Tests Benchmarking Kafka Cluster Benchmarking Final Thoughts on Testing Monitoring Monitoring Checklist Extracting JMX Metrics Deployment ksqlDB Containers Kafka Streams Containers Container Orchestration Operations Resetting a Kafka Streams Application Rate-Limiting the Output of Your Application Upgrading Kafka Streams Upgrading ksqlDB Summary A. Kafka Streams Configuration Configuration Management Configuration Properties Consumer-Specific Configurations B. ksqlDB Configuration Query Configurations Server Configurations Security Configurations Index
Donate to keep this site alive
How to download source code?
1. Go to: https://www.oreilly.com/
2. Search the book title: Mastering Kafka Streams and ksqlDB: Building Real-Time Data Systems by Example
, sometime you may not get the results, please search the main title
3. Click the book title in the search results
3. Publisher resources
section, click Download Example Code
.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.