Chaos Engineering
- Length: 308 pages
- Edition: 1
- Language: English
- Publisher: O'Reilly Media
- Publication Date: 2020-05-05
- ISBN-10: 1492043869
- ISBN-13: 9781492043867
- Sales Rank: #221358 (See Top 100 Books)
As more companies move toward microservices and other distributed technologies, the complexity of these systems increases. You can’t remove the complexity, but through Chaos Engineering you can discover vulnerabilities and prevent outages before they impact your customers. This practical guide shows engineers how to navigate complex systems while optimizing to meet business goals.
Two of the field’s prominent figures, Casey Rosenthal and Nora Jones, pioneered the discipline while working together at Netflix. In this book, they expound on the what, how, and why of Chaos Engineering while facilitating a conversation from practitioners across industries. Many chapters are written by contributing authors to widen the perspective across verticals within (and beyond) the software industry.
- Learn how Chaos Engineering enables your organization to navigate complexity
- Explore a methodology to avoid failures within your application, network, and infrastructure
- Move from theory to practice through real-world stories from industry experts at Google, Microsoft, Slack, and LinkedIn, among others
- Establish a framework for thinking about complexity within software systems
- Design a Chaos Engineering program around game days and move toward highly targeted, automated experiments
- Learn how to design continuous collaborative chaos experiments
Copyright Table of Contents Preface Conventions Used in This Book O’Reilly Online Learning How to Contact Us Acknowledgments Introduction: Birth of Chaos Management Principles as Code Chaos Monkey Is Born Going Big Formalizing the Discipline Community Is Born Fast Evolution Part I. Setting the Stage Chapter 1. Encountering Complex Systems Contemplating Complexity Encountering Complexity Example 1: Mismatch Between Business Logic and Application Logic Example 2: Customer-Induced Retry Storm Example 3: Holiday Code Freeze Confronting Complexity Accidental Complexity Essential Complexity Embracing Complexity Chapter 2. Navigating Complex Systems Dynamic Safety Model Economics Workload Safety Economic Pillars of Complexity State Relationships Environment Reversibility Economic Pillars of Complexity Applied to Software The Systemic Perspective Chapter 3. Overview of Principles What Chaos Engineering Is Experimentation Versus Testing Verification Versus Validation What Chaos Engineering Is Not Breaking Stuff Antifragility Advanced Principles Build a Hypothesis Around Steady-State Behavior Vary Real-World Events Run Experiments in Production Automate Experiments to Run Continuously Minimize Blast Radius The Future of “The Principles” Part II. Principles in Action Chapter 4. Slack’s Disasterpiece Theater Retrofitting Chaos Design Patterns Common in Older Systems Design Patterns Common in Newer Systems Getting to Basic Fault Tolerance Disasterpiece Theater Goals Anti-Goals The Process Preparation The Exercise Debriefing How the Process Has Evolved Getting Management Buy-In Results Avoid Cache Inconsistency Try, Try Again (for Safety) Impossibility Result Conclusion Chapter 5. Google DiRT: Disaster Recovery Testing Life of a DiRT Test The Rules of Engagement What to Test How to Test Gathering Results Scope of Tests at Google Conclusion Chapter 6. Microsoft Variation and Prioritization of Experiments Why Is Everything So Complicated? An Example of Unexpected Complications A Simple System Is the Tip of the Iceberg Categories of Experiment Outcomes Known Events/Unexpected Consequences Unknown Events/Unexpected Consequences Prioritization of Failures Explore Dependencies Degree of Variation Varying Failures Combining Variation and Prioritization Expanding Variation to Dependencies Deploying Experiments at Scale Conclusion Chapter 7. LinkedIn Being Mindful of Members Learning from Disaster Granularly Targeting Experiments Experimenting at Scale, Safely In Practice: LinkedOut Failure Modes Using LiX to Target Experiments Browser Extension for Rapid Experimentation Automated Experimentation Conclusion Chapter 8. Capital One Adoption and Evolution of Chaos Engineering A Capital One Case Study Blind Resiliency Testing Transition to Chaos Engineering Chaos Experiments in CI/CD Things to Watch Out for While Designing the Experiment Tooling Team Structure Evangelism Conclusion Part III. Human Factors Chapter 9. Creating Foresight Chaos Engineering and Resilience Steps of the Chaos Engineering Cycle Designing the Experiment Tool Support for Chaos Experiment Design Effectively Partnering Internally Understand Operating Procedures Discuss Scope Hypothesize Conclusion Chapter 10. Humanistic Chaos Humans in the System Putting the “Socio” in Sociotechnical Systems Organizations Are a System of Systems Engineering Adaptive Capacity Spotting Weak Signals Failure and Success, Two Sides of the Same Coin Putting the Principles into Practice Build a Hypothesis Vary Real-World Events Minimize the Blast Radius Case Study 1: Gaming Your Game Days Communication: The Network Latency of Any Organization Case Study 2: Connecting the Dots Leadership Is an Emergent Property of the System Case Study 3: Changing a Basic Assumption Safely Organizing the Chaos All You Need Is Altitude and a Direction Close the Loops If You’re Not Failing, You’re Not Learning Chapter 11. People in the Loop The Why, How, and When of Experiments The Why The How The When Functional Allocation, or Humans-Are-Better-At/Machines-Are-Better-At The Substitution Myth Conclusion Chapter 12. The Experiment Selection Problem (and a Solution) Choosing Experiments Random Search The Age of the Experts Observability: The Opportunity Observability for Intuition Engineering Conclusion Part IV. Business Factors Chapter 13. ROI of Chaos Engineering Ephemeral Nature of Incident Reduction Kirkpatrick Model Level 1: Reaction Level 2: Learning Level 3: Transfer Level 4: Results Alternative ROI Example Collateral ROI Conclusion Chapter 14. Open Minds, Open Science, and Open Chaos Collaborative Mindsets Open Science; Open Source Open Chaos Experiments Experiment Findings, Shareable Results Conclusion Chapter 15. Chaos Maturity Model Adoption Who Bought into Chaos Engineering How Much of the Organization Participates in Chaos Engineering Prerequisites Obstacles to Adoption Sophistication Putting It All Together Part V. Evolution Chapter 16. Continuous Verification Where CV Comes From Types of CV Systems CV in the Wild: ChAP ChAP: Selecting Experiments ChAP: Running Experiments The Advanced Principles in ChAP ChAP as Continuous Verification CV Coming Soon to a System Near You Performance Testing Data Artifacts Correctness Chapter 17. Let’s Get Cyber-Physical The Rise of Cyber-Physical Systems Functional Safety Meets Chaos Engineering FMEA and Chaos Engineering Software in Cyber-Physical Systems Chaos Engineering as a Step Beyond FMEA Probe Effect Addressing the Probe Effect Conclusion Chapter 18. HOP Meets Chaos Engineering What Is Human and Organizational Performance (HOP)? Key Principles of HOP Principle 1: Error Is Normal Principle 2: Blame Fixes Nothing Principle 3: Context Drives Behavior Principle 4: Learning and Improving Is Vital Principle 5: Intentional Response Matters HOP Meets Chaos Engineering Chaos Engineering and HOP in Practice Conclusion Chapter 19. Chaos Engineering on a Database Why Do We Need Chaos Engineering? Robustness and Stability A Real-World Example Applying Chaos Engineering Our Way of Embracing Chaos Fault Injection Fault Injection in Applications Fault Injection in CPU and Memory Fault Injection in the Network Fault Injection in the Filesystem Detecting Failures Automating Chaos Automated Experimentation Platform: Schrodinger Schrodinger Workflow Conclusion Chapter 20. The Case for Security Chaos Engineering A Modern Approach to Security Human Factors and Failure Remove the Low-Hanging Fruit Feedback Loops Security Chaos Engineering and Current Methods Problems with Red Teaming Problems with Purple Teaming Benefits of Security Chaos Engineering Security Game Days Example Security Chaos Engineering Tool: ChaoSlingr The Story of ChaoSlingr Conclusion Chapter 21. Conclusion Index About the Authors Colophon
Donate to keep this site alive
How to download source code?
1. Go to: https://www.oreilly.com/
2. Search the book title: Chaos Engineering
, sometime you may not get the results, please search the main title
3. Click the book title in the search results
3. Publisher resources
section, click Download Example Code
.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.