Hands-on Site Reliability Engineering: Build Capability to Design, Deploy, Monitor, and Sustain Enterprise Software Systems at Scale
- Length: 236 pages
- Edition: 1
- Language: English
- Publisher: BPB Publications
- Publication Date: 2021-07-06
- ISBN-10: 9391030327
- ISBN-13: 9789391030322
- Sales Rank: #1301074 (See Top 100 Books)
A comprehensive guide with basic to advanced SRE practices and hands-on examples.
Key Features
- Demonstrates how to execute site reliability engineering along with fundamental concepts.
- Illustrates real-world examples and successful techniques to put SRE into production.
- Introduces you to DevOps, advanced techniques of SRE, and popular tools in use.
Description
Hands-on Site Reliability Engineering (SRE) brings you a tailor-made guide to learn and practice the essential activities for the smooth functioning of enterprise systems, right from designing to the deployment of enterprise software programs and extending to scalable use with complete efficiency and reliability.
The book explores the fundamentals around SRE and related terms, concepts, and techniques that are used by SRE teams and experts. It discusses the essential elements of an IT system, including microservices, application architectures, types of software deployment, and concepts like load balancing. It explains the best techniques in delivering timely software releases using containerization and CI/CD pipeline. This book covers how to track and monitor application performance using Grafana, Prometheus, and Kibana along with how to extend monitoring more effectively by building full-stack observability into the system.
The book also talks about chaos engineering, types of system failures, design for high-availability, DevSecOps and AIOps.
What you will learn
- Learn the best techniques and practices for building and running reliable software.
- Explore observability and popular methods for effective monitoring of applications.
- Workaround SLIs, SLOs, Error Budgets, and Error Budget Policies to manage failures.
Who this book is for
This book caters to experienced IT professionals, application developers, software engineers, and all those who are looking to develop SRE capabilities at the individual or team level.
About the Authors
Shamayel M. Farooqui is a technology leader who specializes in driving digital transformation for organizations and is the author of ‘Enterprise DevOps Framework – Transforming IT Operations’.
He has expertise in implementing IT security, cloud migrations, and IT automation and a proven track record of building teams of skilled site reliability engineers focused on delivering solutions for optimizing and running hybrid, multi-cloud environments.log links: http://www.shamayelfarooqui.com, http://www.shamayelfarooqui.com, https://www.xfgeek.com/home
LinkedIn Profile: https://www.linkedin.com/in/shamayel/
Vishnu Vardhan Chikoti has diverse experience in the areas of Application and Database design and development, Micro-services & Micro-frontends, DevOps, Site Reliability Engineering, and Machine Learning.
With the ability to conduct deep analysis, strong execution skills, and an innovative mindset, he has successfully led R&D teams to build engineering solutions to improve the reliability of applications. He is also an expert in building high-volume transaction processing applications for middle and back-office functions for Investment Banks using a variety of architectures.
LinkedIn Profile: https://www.linkedin.com/in/vishnu-vardhan-chikoti-3763262/
Cover Page Title Page Copyright Page Foreword Dedication Page About the Authors About the Reviewer Acknowledgement Preface Errata Table of Contents 1. Understanding the World of IT Structure Objective What is the role of IT in an organization? Hardware availability Core software services Compliance and security Application development and hosting Enterprise Architecture (EA) Software delivery Understanding the IT organization structure Role of infrastructure teams Data centers Virtualization Containerization On-premise infrastructure Cloud infrastructure Development and deployment platforms Role of application teams Cross-functional development teams DevOps teams Production support/operations teams IT security Change management team The TCP/IP protocol suite Domain Name System Conclusion Multiple choice questions Answers 2. Introduction to DevOps Structure Objective Introduction to DevOps DevOps principles and practices DevOps principles DevOps practices Benefits of DevOps Overview of DevOps tools Git Ansible Jenkins Conclusion Multiple choice questions Answers 3. Introduction to SRE Structure Objective DevOps and SRE Rise of internet companies SRE overview SRE terms SRE team responsibilities Skill set of SREs Conclusion Multiple choice questions Answers 4. Identify and Eliminate Toil Structure Objective Understanding toil Importance of eliminating toil Process optimization with automation Examples of toil with approaches to automate Purging and archiving of files Purging of database tables Installation/Patching Monitoring Checking log files Identify and Access Management Vulnerability scans Infrastructure provisioning/decommissioning Incident management Conclusion Multiple choice questions Answers 5. Release Management Structure Objective Understanding release management Release planning Build package Test for quality and security Deployment Release automation with CI/CD Using IaC for release management Blue-green deployments Canary deployments Conclusion Multiple Choice Questions Answers 6. Incident Management Structure Objective Understanding an incident management Incident Incident lifecycle Blameless postmortems Incident example Incident detection/notification Incident triage Incident communication Incident resolution Incident retrospective/postmortem Incident knowledge base Role of development teams Conclusion Multiple choice questions Answers 7. IT Monitoring Structure Objective End to end monitoring strategy Infrastructure monitoring Server monitoring Network monitoring Storage monitoring Application monitoring Probes Checking logs Capturing processing time MQ monitoring Database monitoring End user monitoring DNS monitoring Monitoring Tools Agents Transport Collectors Data transformation Storage Alerting Dashboarding Prometheus Metricbeat Grafana ElastAlert Conclusion Multiple choice questions Answers 8. Observability Structure Objective Goals of observability Service reliability Operational efficiency Security and compliance Three pillars of observability Standardized libraries/APIs/SDKs Standardized trace context Tracers Cardinality attributes Open source libraries and tools Filebeat Logstash Fluentd OpenTelemetry Conclusion Multiple Choice Questions Answers 9. Key SRE KPIs: SLAs, SLOs, SLIs, and Error Budgets Structure Objective Key metrics for SRE Service level indicator (SLI) Service Level Objective (SLO) Service level agreement (SLA) Error budgets Error budget policy Conclusion Multiple choice questions Answers 10. Chaos Engineering Structure Objective Introducing chaos engineering Application/service unavailability Network delays Network failures Resource unavailability Configuration errors Database failures Chaos engineering process Define steady state Build a hypothesis Minimize blast radius Inject the failure condition Verify hypothesis Reverse failure condition Fix any issues Automate to run continuously Chaos GameDays Injecting failures Killing a process Network failures HTTP failures Injecting multiple failures Techniques for building resiliency Single point of failures Rate limiting/throttling Circuit breaker Handle retry storms Conclusion Multiple choice questions Answers 11. DevSecOps and AIOps Structure Objective Understanding DevSecOps Code scanning for security Secure releases using Infrastructure as Code Introduction to AIOps Use cases with AIOps Intelligent alerting Noise reduction Automated root cause analysis Automated remediation ChatOps ChatOps example with Rasa, Flask, and Telegram Conclusion Multiple choice questions Answers 12. Culture of Site Reliability Engineering Structure Objective Breaking silos in the organization Embracing risk Continuous improvement Intelligent automation Shift-left mindset Conclusion Multiple choice questions Answers Index
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.