Observability Engineering: Achieving Production Excellence

by Charity Majors, George Miranda, Liz Fong-Jones

Length: 400 pages
Edition: 1
Language: English
Publisher: O'Reilly Media
Publication Date: 2022-06-14
ISBN-10: 1492076449
ISBN-13: 9781492076445
Sales Rank: #500640 (See Top 100 Books)

Observability is critical for engineering, managing, and improving complex business-critical systems. Through this process, any software engineering team can gain a deeper understanding of system performance, so you can perform ongoing maintenance and ship the features your customers need. This practical book explains the value of observable systems and shows you how to build an observability-driven development practice.

Authors Charity Majors, Liz Fong-Jones, and George Miranda from Honeycomb explain what constitutes good observability, show you how to make improvements from what you’re doing today, and provide practical dos and don’ts for migrating from legacy tooling, such as metrics monitoring and log management. You’ll also learn the impact observability has on organization culture.

You’ll explore:

The value of practicing observability when delivering and managing complex cloud native applications and systems
The impact observability has across the entire software engineering cycle
Software ownership: how different functional teams help achieve system SLOs
How software developers contribute to customer experience and business impact
How to produce quality code for context-aware system debugging and maintenance
How data-rich analytics can help you find answers quickly when maintaining site reliability

Preface
    Who this is for
    Why we wrote this book
    What you will learn
1. What is Observability?
    The mathematical definition of observability
    Applying observability to software systems
    Mischaracterizations of observability for software
    Why observability matters now
    Is this really the best way?
    Why are metrics and monitoring not enough?
    Debugging with metrics vs. observability
    The role of cardinality
    Debugging with observability
    Observability is for modern systems
    Conclusion
2. How Observability Differs from Monitoring
    How monitoring data is used
        Troubleshooting behaviors when using dashboards
        The limitations of troubleshooting by intuition
        Traditional monitoring is fundamentally reactive
    How observability is different
    Conclusion
3. Lessons from Scaling Without Observability
    An introduction to Parse
    Scaling at Parse
    The evolution toward modern systems
    The evolution toward modern practices
    Shifting practices at Parse
    Conclusion
4. How Observability Relates to DevOps, SRE, and Cloud Native
    Cloud Native DevOps, and SRE in a nutshell
    Observability: Debugging Then vs. Now
        Observability empowers DevOps and SRE practices
5. Structured Events Are the Building Blocks of Observability
    Debugging with structured events
    The limitations of metrics as a building block
    The limitations of unstructured data as a building block
    Properties of events that are useful in debugging
    Conclusion
6. Stitching Events into Traces
    Distributed tracing and why it matters now
        The components of tracing
    Instrumenting a trace the hard way
        Adding custom fields into trace spans
    Stitching events into traces
    Conclusion
7. Analyzing Events to Achieve Observability
    Debugging from known conditions
    Debugging from first principles
        The core analysis loop
        Automating the brute force portion of the core analysis loop
    This misleading promise of AIOps
    Conclusion
8. How observability and monitoring come together
    Where monitoring fits
    Infrastructure considerations vs. software considerations
    Assessing your organizational needs
        Exceptions: infrastructure monitoring that can’t be ignored
    Real world examples
    Conclusion
9. Applying observability practices in your team
    Join a community group
    Start with the biggest pain points
    Buy instead of build
    Flesh out your instrumentation iteratively
    Look for opportunities to leverage existing efforts
    The last push is the hardest to complete
    Conclusion
10. Observability-Driven Development
    Test-driven development
    Observability in the development cycle
    Determining where to debug
        Debugging in the time of microservices
        How instrumentation drives observability
    Shifting observability left
11. Using Service Level Objectives for Reliability
    Introduction to Service Level Objectives
        Traditional Monitoring Approaches Create Dangerous Alert Fatigue
        Distributed Systems Exacerbate the Alerting Problem
        Static Thresholds Can’t Reliably Indicate Degraded User Experience
        Reliable Alerting with SLOs
        Changing Culture Toward SLO-Based Alerts: A Case Study
    Conclusion
12. Using observability data to model actionable SLOs
    Alerting before your error budget is empty
    Framing time as a sliding window
    Forecast models to create a predictive burn alert
        The lookahead window
        The baseline window
        Acting on SLO burn alerts
    Observability data for SLOs vs. time series data
    Conclusion
13. Cheap and Accurate Enough: Sampling
    Sampling to refine your data collection
    Different approaches to sampling
        Constant-probability sampling
        Sampling on recent traffic volume
        Sampling based on event content (keys)
        Combining per-key and historical methods
        Choosing dynamic sampling options
        When to make a sampling decision for traces
    Translating sampling strategies into code
        The base case
        Fixed-rate sampling
        Recording the sample rate
        Consistent sampling
        Target Rate Sampling
        Having more than one static sample rate
        Sampling by key and target rate
        Sampling with dynamic rates on arbitrarily many keys
        Putting it all together: head and tail per-key target rate sampling
    Conclusion
14. Build vs. Buy and Return on Investment
    How to analyze the ROI of observability
    The real costs of building your own
        The hidden costs of using “free” software
        The benefits of building your own
        The risks of building your own
    The real costs of buying software
        The hidden financial costs of commercial software
        The hidden non-financial costs of commercial software
        The benefits of buying commercial software
        The risks of buying commercial software:
    Buy vs. Build is not a binary choice
    Conclusion
15. The Business Case for Observability
    The reactive approach to introducing change
    The proactive approach to introducing change
    Introducing observability as a practice
    Using the appropriate tools
        Instrumentation
        Data storage and analytics
        Rolling out tools to your teams
    Knowing when you have enough observability
    Conclusion
16. An Observability Maturity Model
    A foreword about maturity models
    Why observability needs a maturity model
    About the Observability Maturity Model
    Capabilities referenced in the OMM
        Respond to system failure with resilience
        Deliver high quality code
        Manage complexity and technical debt
        Release on a predictable cadence
        Understand user behavior
    Using the OMM for your organization
    Conclusion
About the Authors

Data Mining Data Warehousing Database Storage & Design Linux Linux & UNIX Administration Network Administration Networking & System Administration Storage & Retrieval Windows Windows Administration Windows Desktop

Donate to keep this site alive

To access the Link, solve the captcha.

How to download source code?

1. Go to: https://www.oreilly.com/

2. Search the book title: Observability Engineering: Achieving Production Excellence, sometime you may not get the results, please search the main title

3. Click the book title in the search results

3. Publisher resources section, click Download Example Code.

1. Disable the AdBlock plugin. Otherwise, you may not get any links.

2. Solve the CAPTCHA.

3. Click download link.

4. Lead to download server to download.

Observability Engineering: Achieving Production Excellence

How to download source code?

Linkerd: Up and Running: A Guide to Operationalizing a Kubernetes-native Service Mesh

Python for Data Science, 2nd Edition

Visual Analytics Fundamentals: Creating Compelling Data Narratives with Tableau

Postman Cookbook: Hand-picked Solutions and Techniques across API Design, Testing, Performance, Networking, Kubernetes and Integration

Painless Docker: Unlock the Power of Docker and its Ecosystem

Bayesian Analysis with Excel and R