Software Telemetry shows you how to efficiently collect, store, and analyze system and application log data so you can monitor and improve your systems.
In Software Telemetry you will learn how to:
Manage toxic telemetry and confidential records
Master multi-tenant techniques and transformation processes
Update to improve the statistical validity of your metrics and dashboards
Make software telemetry emissions easier to parse
Build easily-auditable logging systems
Prevent and handle accidental data leaks
Maintain processes for legal compliance
Justify increased spend on telemetry software
Software Telemetry teaches you best practices for operating and updating telemetry systems. These vital systems trace, log, and monitor infrastructure by observing and analyzing the events generated by the system. This practical guide is filled with techniques you can apply to any size of organization, with troubleshooting techniques for every eventuality, and methods to ensure your compliance with standards like GDPR.
Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.
About the technology
Take advantage of the data generated by your IT infrastructure! Telemetry systems provide feedback on what’s happening inside your data center and applications, so you can efficiently monitor, maintain, and audit them. This practical book guides you through instrumenting your systems, setting up centralized logging, doing distributed tracing, and other invaluable telemetry techniques.
About the book
Software Telemetry shows you how to efficiently collect, store, and analyze system and application log data so you can monitor and improve your systems. Manage the pillars of observability—logs, metrics, and traces—in an end-to-end telemetry system that integrates with your existing infrastructure. You’ll discover how software telemetry benefits both small startups and legacy enterprises. And at a time when data audits are increasingly common, you’ll appreciate the thorough coverage of legal compliance processes, so there’s no reason to panic when a discovery request arrives.
Multi-tenant techniques and transformation processes
Toxic telemetry and confidential records
Updates to improve the statistical validity of your metrics and dashboards
Revisions that make software telemetry emissions easier to parse
About the reader
For software developers and infrastructure engineers supporting and building telemetry systems.
About the author
Jamie Riedesel is a staff engineer at Dropbox with over twenty years of experience in IT.
Table of Contents
PART 1 TELEMETRY SYSTEM ARCHITECTURE
2 The Emitting stage: Creating and submitting telemetry
3 The Shipping stage: Moving and storing telemetry
4 The Shipping stage: Unifying diverse telemetry formats
5 The Presentation stage: Displaying telemetry
6 Marking up and enriching telemetry
7 Handling multitenancy
PART 2 USE CASES REVISITED: APPLYING ARCHITECTURE CONCEPTS
8 Growing cloud-based startup
9 Nonsoftware business
10 Long-established business IT
PART 3 TECHNIQUES FOR HANDLING TELEMETRY
11 Optimizing for regular expressions at scale
12 Standardized logging and event formats
13 Using more nonfile emitting techniques
14 Managing cardinality in telemetry
15 Ensuring telemetry integrity
16 Redacting and reprocessing telemetry
17 Building policies for telemetry retention and aggregation
18 Surviving legal processes
inside front cover Software Telemetry Copyright dedication brief contents contents front matter preface acknowledgments about this book Who should read this book How this book is organized: A road map About the code liveBook discussion forum Other online resources about the author about the cover illustration 1 Introduction 1.1 Defining the styles of telemetry 1.1.1 Defining centralized logging 1.1.2 Defining metrics 1.1.3 Defining distributed tracing 1.1.4 Defining SIEM 1.2 How telemetry is consumed by different teams 1.2.1 Telemetry use by Operations, DevOps, and SRE teams 1.2.2 Telemetry use by Security and Compliance teams 1.2.3 Telemetry use by Software Engineering and SRE teams 1.2.4 Telemetry use by Customer Support teams 1.2.5 Telemetry use by business intelligence 1.3 Challenges facing telemetry systems 1.3.1 Chronic underinvestment harms decision-making 1.3.2 Diverse needs resist standardization 1.3.3 Information spills and cleaning them up to avoid legal problems 1.3.4 Court orders break your assumptions 1.4 What you will learn Summary Part 1. Telemetry system architecture 2 The Emitting stage: Creating and submitting telemetry 2.1 Emitting from production code 2.1.1 Emitting telemetry into a log file 2.1.2 Emitting telemetry into the system log 2.1.3 Emitting telemetry into standard output 2.1.4 Formatting telemetry for emissions 2.2 Emitting from hardware 2.2.1 Explaining SNMP 2.2.2 Ingesting telemetry from a Cisco ASA firewall 2.3 Emitting from as-a-Service systems 2.3.1 Emitting events from SaaS systems 2.3.2 Emitting events from IaaS systems Summary 3 The Shipping stage: Moving and storing telemetry 3.1 Emitter/shipper functions, telemetry from production code 3.1.1 Shipping directly into storage 3.1.2 Shipping through queues and streams 3.1.3 Shipping to SaaS systems 3.2 Shipping between SaaS systems 3.3 Tipping points in Shipping-stage architecture Summary 4 The Shipping stage: Unifying diverse telemetry formats 4.1 Shipping locally-emitted telemetry 4.1.1 Shipping telemetry from a log file 4.1.2 Shipping telemetry from the system logger 4.1.3 Shipping telemetry from standard output 4.2 Unifying diverse emitting formats 4.2.1 Encoding telemetry into strings 4.2.2 Picking a shipping format 4.2.3 Converting Syslog to JSON or other object-encoding formats 4.2.4 Designing with cardinality in mind Summary 5 The Presentation stage: Displaying telemetry 5.1 Displaying telemetry in metrics systems 5.1.1 Making pretty pictures with telemetry 5.1.2 Feeding the graphs with aggregation functions 5.1.3 Using aggregations with pdf_pages 5.2 Displaying telemetry in centralized logging systems 5.2.1 Selecting needed features in a display system for centralized logging 5.2.2 Demonstrating centralized logging display 5.3 Displaying telemetry in security systems 5.4 Displaying telemetry distributed tracing systems 5.5 Displaying telemetry in large organizations Summary 6 Marking up and enriching telemetry 6.1 Markup in the Emitting stage 6.2 Markup and enrichment in the Shipping stage 6.2.1 Applying context-related telemetry in the Shipping stage 6.2.2 Extracting and enriching telemetry in-flight 6.2.3 Converting field types during the Shipping stage 6.3 Enrichment in the Presentation stage 6.4 How telemetry style affects markup and enrichment 6.4.1 Markup and enrichment with centralized logging 6.4.2 Markup and enrichment with SIEM systems 6.4.3 Markup and enrichment with metrics 6.4.4 Markup and enrichment with distributed tracing systems Summary 7 Handling multitenancy 7.1 How multitenant architectures come about 7.1.1 Evolving multitenancy in an early-stage startup 7.1.2 Evolving multitenancy in a culture of free sharing 7.1.3 Evolving multitenancy in a culture of strong separation 7.2 Designing multitenant telemetry systems 7.2.1 Multitenancy in the Shipping stage 7.2.2 Multitenancy in the Presentation stage Summary Part 2. Use cases revisited: Applying architecture concepts 8 Growing cloud-based startup 8.1 Telemetry at the small-company stage 8.1.1 Describing the small company’s telemetry system 8.1.2 Analyzing the small company’s telemetry system 8.2 Telemetry at the medium-size company stage 8.2.1 Describing the medium-size company’s telemetry system 8.2.2 Analyzing the medium-size company’s telemetry system 8.3 Telemetry at the large-company stage 8.3.1 Describing the large company’s telemetry system 8.3.2 Analyzing the large company’s telemetry system 8.4 Telemetry at the enterprise stage 8.5 Looking back at all this growth Summary 9 Nonsoftware business 9.1 Telemetry use in small organizations 9.2 Telemetry use in medium-size organizations 9.3 Telemetry use in large organizations 9.4 Telemetry use in enterprise organizations Summary 10 Long-established business IT 10.1 Telemetry use in medium-size organizations 10.1.1 Telemetry use in office IT 10.1.2 Telemetry use in production systems 10.2 Telemetry use in large organizations 10.3 Telemetry use in global organizations 10.3.1 Telemetry use in the Booking and Passenger Manifest department 10.3.2 Telemetry use in the Loyalty Programs department Summary Part 3. Techniques for handling telemetry 11 Optimizing for regular expressions at scale 11.1 Anchoring expressions for speed 11.2 Building expressions to fail fast 11.3 Digging into the Cisco ASA firewall telemetry 11.4 Refining emissions to speed regular-expression performance 11.5 Additional regular-expression resources Summary 12 Standardized logging and event formats 12.1 Implementing structured logging in your code 12.2 Implementing standards in your code 12.3 Implementing standards in the Shipping stage Summary 13 Using more nonfile emitting techniques 13.1 Designing for socket- and datagram-based emitters 13.2 Emitting and shipping for container- and serverless-based code 13.2.1 Emitting and shipping from containerd-based code 13.2.2 Emitting and shipping from serverless-based code 13.3 Encrypting UDP-based telemetry Summary 14 Managing cardinality in telemetry 14.1 Identifying cardinality problems 14.1.1 Cardinality in time-series databases 14.1.2 Cardinality in logging databases 14.2 Lowering the cost of cardinality 14.2.1 Use logging standards to contain cardinality 14.2.2 Using storage-side methods to tame cardinality 14.2.3 Make cardinality someone else’s problem Summary 15 Ensuring telemetry integrity 15.1 Getting telemetry out of reach of an attacker 15.1.1 Move telemetry too fast to catch 15.1.2 Use ACLs to enforce write-only telemetry 15.1.3 Durable telemetry when using SaaS providers 15.2 Making telemetry harder to mess with 15.2.1 Using access control requirements to defend against attacks 15.2.2 Ensuring configuration integrity in your telemetry systems 15.2.3 Making changes obvious Summary 16 Redacting and reprocessing telemetry 16.1 Identifying toxic data and where it comes from 16.2 Redacting toxic information spills 16.3 Reprocessing telemetry to support upgrades 16.4 Isolating toxic data to reduce cleanup costs Summary 17 Building policies for telemetry retention and aggregation 17.1 Creating a retention policy 17.1.1 Building a policy for centralized logging 17.1.2 Building a policy for metrics 17.1.3 Building a policy for distributed tracing 17.1.4 Building a policy for SIEM systems 17.2 Creating an aggregation policy 17.3 Using sampling to reduce costs and increase retention Summary 18 Surviving legal processes 18.1 Defining the eDiscovery process 18.2 Dealing with records-retention requests 18.2.1 Examining an ELK-based centralized logging system 18.2.2 Examining a Sumo Logic-based centralized logging system 18.3 Dealing with document-production requests 18.3.1 Telemetry in the collection phase 18.3.2 Telemetry in the review phase 18.3.3 Telemetry in the production phase 18.4 Working with lawyers Summary Appendix A. Telemetry storage systems A.1 Analyzing Elasticsearch A.1.1 What Elasticsearch is good at A.1.2 What is challenging for Elasticsearch A.2 Analyzing Apache Cassandra A.2.1 What Cassandra is good at A.2.2 What is challenging for Cassandra A.3 Analyzing Grafana Labs’ Loki A.3.1 What Loki is good at A.3.2 What is challenging for Loki A.4 Analyzing MongoDB A.4.1 What MongoDB is good at A.4.2 What is challenging for MongoDB A.5 Analyzing Prometheus A.5.1 What Prometheus is good at A.5.2 What is challenging for Prometheus A.6 Analyzing InfluxDB A.6.1 What InfluxDB is good at A.6.2 What is challenging for InfluxDB A.7 Analyzing Jaeger A.7.1 What Jaeger is good at A.7.2 What is challenging for Jaeger Appendix B. Recommendation checklist reference B.1 Telemetry standards, structure, and setting policies Section 4.2.2: Setting standardized telemetry formats Section 4.2.4: Designing telemetry formats with cardinality in mind Section 6.4.1: When and where to mark up or enrich telemetry in centralized logging systems Section 6.4.3: When and where to mark up or enrich telemetry in metrics systems Section 7.2.1: How parasitic is that parasitic load? Chapter 11: Making regular expressions fast Section 11.4: The project phases for optimizing your logging statements for regular expressions Chapter 12: The benefits of using a structured logger Section 13.1: In-memory networking and how it eases telemetry Section 14.2.1: Enforcing logging standards through development process Section 17.1.3: Recommendations on setting a tracing retention policy Section 17.1.4: Recommendations on setting a SIEM retention policy Section 17.3: Considerations when picking a sampling rate B.2 Presentation-stage recommendations Section 5.1.1: The features of a good metrics system Section 5.1.1: Considerations for building dashboards Section 5.2.1: The features of a good centralized logging system Section 5.3: Extending centralized logging to SIEM work Section 7.2.2: Adding multitenancy B.3 Cardinality management Section 4.2.4: Designing telemetry formats with cardinality in mind Section 14.1: The symptoms of high cardinality Section 14.2.1: Healthy low-cardinality context-related telemetry Section 14.2.2: How sharding affects cardinality management Section 14.2.3: When to make cardinality someone else’s problem B.4 Telemetry safety and effects Chapter 15: The two principles of secure telemetry Section 15.1.1: Moving telemetry too fast to catch Section 15.2.1: The three Linux Mandatory Access Control systems Section 15.2.1: Places to use ACLs in a telemetry pipeline Section 15.2.3: How encryption and digital signatures support telemetry Section 15.2.3: How encryption and digital signatures make telemetry more fragile Section 16.1: The three types of toxic data Section 16.1: The penalties for mishandling toxic data Section 16.3: What drives periodic reprocessing Section 16.4: Why isolating telemetry helps you Section 16.4: Tips to avoid false-positive toxic-data detections B.5 Legal topics Section 18.2: Questions to ask when assessing a telemetry system to handle legal hold orders Section 18.4: How to work with lawyers Appendix C. Exercise answers index inside back cover
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.