Cassandra: The Definitive Guide: Distributed Data at Web Scale, Revised 3rd Edition
- Length: 432 pages
- Edition: 3
- Language: English
- Publisher: O'Reilly Media
- Publication Date: 2022-02-22
- ISBN-10: 1492097144
- ISBN-13: 9781492097143
- Sales Rank: #1622994 (See Top 100 Books)
Imagine what you could do if scalability wasn’t a problem. With this hands-on guide, you’ll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This revised third edition–updated for Cassandra 4.0 and new developments in the Cassandra ecosystem, including deployments in Kubernetes with K8ssandra–provides technical details and practical examples to help you put this database to work in a production environment.
Authors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandra’s nonrelational design, with special attention to data modeling. Developers, DBAs, and application architects looking to solve a database scaling issue or future-proof an application will learn how to harness Cassandra’s speed and flexibility.
- Understand Cassandra’s distributed and decentralized structure
- Use the Cassandra Query Language (CQL) and cqlsh (the CQL shell)
- Create a working data model and compare it with an equivalent relational model
- Design and develop applications using client drivers
- Explore cluster topology and learn how nodes exchange data
- Maintain a high level of performance in your cluster
- Deploy Cassandra onsite, in the cloud, or with Docker and Kubernetes
- Integrate Cassandra with Spark, Kafka, Elasticsearch, Solr, and Lucene
Foreword Preface Why Apache Cassandra? Is This Book for You? What’s in This Book? New for the Third Edition Note on the Revised Third Edition Conventions Used in This Book Using Code Examples O’Reilly Interactive Katacoda Scenarios O’Reilly Online Learning How to Contact Us Acknowledgments 1. Beyond Relational Databases What’s Wrong with Relational Databases? A Quick Review of Relational Databases Transactions, ACID-ity, and Two-Phase Commit Schema Sharding and Shared-Nothing Architecture Web Scale The Rise of NoSQL Summary 2. Introducing Cassandra The Cassandra Elevator Pitch Cassandra in 50 Words or Less Distributed and Decentralized Elastic Scalability High Availability and Fault Tolerance Tuneable Consistency Brewer’s CAP Theorem Row-Oriented High Performance Where Did Cassandra Come From? Is Cassandra a Good Fit for My Project? Large Deployments Lots of Writes, Statistics, and Analysis Geographical Distribution Hybrid Cloud and Multicloud Deployment Getting Involved Summary 3. Installing Cassandra Installing the Apache Distribution Extracting the Download What’s in There? Building from Source Additional Build Targets Running Cassandra Setting the Environment Starting the Server Stopping Cassandra Other Cassandra Distributions Running the CQL Shell Basic cqlsh Commands cqlsh Help Describing the Environment in cqlsh Creating a Keyspace and Table in cqlsh Writing and Reading Data in cqlsh Running Cassandra in Docker Summary 4. The Cassandra Query Language The Relational Data Model Cassandra’s Data Model Clusters Keyspaces Tables Columns Timestamps Time to live (TTL) CQL Types Numeric Data Types Textual Data Types Time and Identity Data Types Other Simple Data Types Collections Tuples User-Defined Types Summary 5. Data Modeling Conceptual Data Modeling RDBMS Design Design Differences Between RDBMS and Cassandra No joins No referential integrity Denormalization Query-first design Designing for optimal storage Sorting is a design decision Defining Application Queries Logical Data Modeling Hotel Logical Data Model Reservation Logical Data Model Physical Data Modeling Hotel Physical Data Model Reservation Physical Data Model Evaluating and Refining Calculating Partition Size Calculating Size on Disk Breaking Up Large Partitions Defining Database Schema Cassandra Data Modeling Tools Summary 6. The Cassandra Architecture Data Centers and Racks Gossip and Failure Detection Snitches Rings and Tokens Virtual Nodes Partitioners Replication Strategies Consistency Levels Queries and Coordinator Nodes Hinted Handoff Anti-Entropy, Repair, and Merkle Trees Lightweight Transactions and Paxos Memtables, SSTables, and Commit Logs Bloom Filters Caching Compaction Deletion and Tombstones Managers and Services Cassandra Daemon Storage Engine Storage Service Storage Proxy Messaging Service Stream Manager CQL Native Transport Server System Keyspaces Summary 7. Designing Applications with Cassandra Hotel Application Design Cassandra and Microservice Architecture Microservice Architecture for a Hotel Application Identifying Bounded Contexts Identifying Services Designing Microservice Persistence Polyglot persistence Representing other database models in CQL Extending Designs Secondary Indexes Materialized Views Reservation Service: A Sample Microservice Design Choices for a Java Microservice Deployment and Integration Considerations Services, Keyspaces, and Clusters Data Centers and Load Balancing Interactions Between Microservices Summary 8. Application Development with Drivers DataStax Java Driver Development Environment Configuration Connecting to a Cluster Statements Simple Statements Prepared Statements Bound statement Query Builder Object Mapper Asynchronous Execution Driver Configuration File-based configuration Basic configuration options Load balancing Retrying failed queries Speculative execution Connection pooling Protocol version Compression Driver security Execution profiles Metadata Node discovery Schema access Debugging and Monitoring Driver logging Driver metrics DataStax Python Driver DataStax Node.js Driver DataStax C# Driver Other Cassandra Drivers Summary 9. Writing and Reading Data Writing Write Consistency Levels The Cassandra Write Path Writing Files to Disk Commit log files SSTable files Lightweight Transactions Batches Reading Read Consistency Levels The Cassandra Read Path Read Repair Range Queries, Ordering and Filtering Paging Deleting Summary 10. Configuring and Deploying Cassandra Cassandra Cluster Manager Creating a Cluster Adding Nodes to a Cluster Dynamic Ring Participation Node Configuration Seed Nodes Snitches Partitioners Tokens and Virtual Nodes Network Interfaces Data Storage Startup and JVM Settings Planning a Cluster Deployment Cluster Topology and Replication Strategies Sizing Your Cluster Selecting Instances Storage Network Cloud Deployment Amazon Web Services Google Cloud Platform Microsoft Azure Summary 11. Monitoring Monitoring Cassandra with JMX Cassandra’s MBeans Database MBeans Storage Service MBean Storage Proxy MBean Hints Service MBean Column Family Store MBean Commit Log MBean Compaction Manager MBean Cache Service MBean Cluster-Related MBeans Gossiper MBean Failure Detector MBean Snitch MBeans Stream Manager MBean Messaging Service MBean Internal MBeans Thread Pool MBeans Garbage Collection MBeans Security MBeans Metrics MBeans Monitoring with nodetool Getting Cluster Information describecluster status info ring Getting Statistics Using tpstats Using tablestats Virtual Tables System Virtual Schema System Views Metrics Logging Examining Log Files Full Query Logging Summary 12. Maintenance Health Check Common Maintenance Tasks Flush Cleanup Repair Full repair, incremental repair, and anti-compaction Sequential and parallel repair Partitioner range repair Subrange repair Best practices for repair Rebuilding Indexes Moving Tokens Adding Nodes Adding Nodes to an Existing Data Center Adding a Data Center to a Cluster Handling Node Failure Repairing Failed Nodes Recovering from disk failure Replacing Nodes Removing Nodes Decommissioning a node Removing a node Assassinating a node Removing a data center Upgrading Cassandra Backup and Recovery Taking a Snapshot Clearing a Snapshot Enabling Incremental Backup Restoring from Snapshot SSTable Utilities Maintenance Tools Netflix Priam DataStax OpsCenter Cassandra Sidecars Cassandra Kubernetes Operators Summary 13. Performance Tuning Managing Performance Setting Performance Goals Benchmarking and Stress Testing Using cassandra-stress Additional load testing tools Monitoring Performance Analyzing Performance Issues Tracing Tuning Methodology Caching Key Cache Row Cache Chunk Cache Counter Cache Saved Cache Settings Memtables Commit Logs SSTables Hinted Handoff Compaction Concurrency and Threading Networking and Timeouts JVM Settings Memory Garbage Collection Default configuration (JDK 8 or JDK 11) Garbage-First garbage collector (JDK 8 or JDK 11) Z Garbage Collector (JDK 11 and later) Shenandoah Garbage Collector (JDK12) Summary 14. Security Authentication and Authorization Password Authenticator Configuring the authenticator Additional authentication providers Adding users Authenticating via the DataStax Java Driver Using CassandraAuthorizer Role-Based Access Control Encryption SSL, TLS, and Certificates Generating Certificates for Development Clusters Generating Certificates for Production Clusters Node-to-Node Encryption Client-to-Node Encryption JMX Security Securing JMX Access Security MBeans Authentication cache MBean Audit Logging Summary 15. Migrating and Integrating Knowing When to Migrate Adapting the Data Model Translating Entities Translating Relationships Adapting the Application Refactoring Data Access Maintaining Consistency Migrating Stored Procedures User-defined functions User-defined aggregates Built-in functions and aggregates Planning the Deployment Migrating Data Zero-Downtime Migration Bulk Loading Common Integrations Managing Data Flow with Apache Kafka Searching with Apache Lucene, SOLR, and Elasticsearch Analyzing Data with Apache Spark Use cases for Spark with Cassandra Deploying Spark with Cassandra The spark-cassandra-connector Summary Index
Donate to keep this site alive
How to download source code?
1. Go to: https://www.oreilly.com/
2. Search the book title: Cassandra: The Definitive Guide: Distributed Data at Web Scale, Revised 3rd Edition
, sometime you may not get the results, please search the main title
3. Click the book title in the search results
3. Publisher resources
section, click Download Example Code
.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.