Observability Engineering: Achieving Production Excellence
- Length: 400 pages
- Edition: 1
- Language: English
- Publisher: O'Reilly Media
- Publication Date: 2022-06-14
- ISBN-10: 1492076449
- ISBN-13: 9781492076445
- Sales Rank: #500640 (See Top 100 Books)
Observability is critical for engineering, managing, and improving complex business-critical systems. Through this process, any software engineering team can gain a deeper understanding of system performance, so you can perform ongoing maintenance and ship the features your customers need. This practical book explains the value of observable systems and shows you how to build an observability-driven development practice.
Authors Charity Majors, Liz Fong-Jones, and George Miranda from Honeycomb explain what constitutes good observability, show you how to make improvements from what you’re doing today, and provide practical dos and don’ts for migrating from legacy tooling, such as metrics monitoring and log management. You’ll also learn the impact observability has on organization culture.
You’ll explore:
- The value of practicing observability when delivering and managing complex cloud native applications and systems
- The impact observability has across the entire software engineering cycle
- Software ownership: how different functional teams help achieve system SLOs
- How software developers contribute to customer experience and business impact
- How to produce quality code for context-aware system debugging and maintenance
- How data-rich analytics can help you find answers quickly when maintaining site reliability
Preface Who this is for Why we wrote this book What you will learn 1. What is Observability? The mathematical definition of observability Applying observability to software systems Mischaracterizations of observability for software Why observability matters now Is this really the best way? Why are metrics and monitoring not enough? Debugging with metrics vs. observability The role of cardinality Debugging with observability Observability is for modern systems Conclusion 2. How Observability Differs from Monitoring How monitoring data is used Troubleshooting behaviors when using dashboards The limitations of troubleshooting by intuition Traditional monitoring is fundamentally reactive How observability is different Conclusion 3. Lessons from Scaling Without Observability An introduction to Parse Scaling at Parse The evolution toward modern systems The evolution toward modern practices Shifting practices at Parse Conclusion 4. How Observability Relates to DevOps, SRE, and Cloud Native Cloud Native DevOps, and SRE in a nutshell Observability: Debugging Then vs. Now Observability empowers DevOps and SRE practices 5. Structured Events Are the Building Blocks of Observability Debugging with structured events The limitations of metrics as a building block The limitations of unstructured data as a building block Properties of events that are useful in debugging Conclusion 6. Stitching Events into Traces Distributed tracing and why it matters now The components of tracing Instrumenting a trace the hard way Adding custom fields into trace spans Stitching events into traces Conclusion 7. Analyzing Events to Achieve Observability Debugging from known conditions Debugging from first principles The core analysis loop Automating the brute force portion of the core analysis loop This misleading promise of AIOps Conclusion 8. How observability and monitoring come together Where monitoring fits Infrastructure considerations vs. software considerations Assessing your organizational needs Exceptions: infrastructure monitoring that can’t be ignored Real world examples Conclusion 9. Applying observability practices in your team Join a community group Start with the biggest pain points Buy instead of build Flesh out your instrumentation iteratively Look for opportunities to leverage existing efforts The last push is the hardest to complete Conclusion 10. Observability-Driven Development Test-driven development Observability in the development cycle Determining where to debug Debugging in the time of microservices How instrumentation drives observability Shifting observability left 11. Using Service Level Objectives for Reliability Introduction to Service Level Objectives Traditional Monitoring Approaches Create Dangerous Alert Fatigue Distributed Systems Exacerbate the Alerting Problem Static Thresholds Can’t Reliably Indicate Degraded User Experience Reliable Alerting with SLOs Changing Culture Toward SLO-Based Alerts: A Case Study Conclusion 12. Using observability data to model actionable SLOs Alerting before your error budget is empty Framing time as a sliding window Forecast models to create a predictive burn alert The lookahead window The baseline window Acting on SLO burn alerts Observability data for SLOs vs. time series data Conclusion 13. Cheap and Accurate Enough: Sampling Sampling to refine your data collection Different approaches to sampling Constant-probability sampling Sampling on recent traffic volume Sampling based on event content (keys) Combining per-key and historical methods Choosing dynamic sampling options When to make a sampling decision for traces Translating sampling strategies into code The base case Fixed-rate sampling Recording the sample rate Consistent sampling Target Rate Sampling Having more than one static sample rate Sampling by key and target rate Sampling with dynamic rates on arbitrarily many keys Putting it all together: head and tail per-key target rate sampling Conclusion 14. Build vs. Buy and Return on Investment How to analyze the ROI of observability The real costs of building your own The hidden costs of using “free” software The benefits of building your own The risks of building your own The real costs of buying software The hidden financial costs of commercial software The hidden non-financial costs of commercial software The benefits of buying commercial software The risks of buying commercial software: Buy vs. Build is not a binary choice Conclusion 15. The Business Case for Observability The reactive approach to introducing change The proactive approach to introducing change Introducing observability as a practice Using the appropriate tools Instrumentation Data storage and analytics Rolling out tools to your teams Knowing when you have enough observability Conclusion 16. An Observability Maturity Model A foreword about maturity models Why observability needs a maturity model About the Observability Maturity Model Capabilities referenced in the OMM Respond to system failure with resilience Deliver high quality code Manage complexity and technical debt Release on a predictable cadence Understand user behavior Using the OMM for your organization Conclusion About the Authors
Donate to keep this site alive
How to download source code?
1. Go to: https://www.oreilly.com/
2. Search the book title: Observability Engineering: Achieving Production Excellence
, sometime you may not get the results, please search the main title
3. Click the book title in the search results
3. Publisher resources
section, click Download Example Code
.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.