
Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications
- Length: 350 pages
- Edition: 1
- Language: English
- Publisher: O'Reilly Media
- Publication Date: 2022-07-19
- ISBN-10: 1098107969
- ISBN-13: 9781098107963
- Sales Rank: #1367842 (See Top 100 Books)
Many tutorials show you how to develop ML systems from ideation to deployed models. But with constant changes in tooling, those systems can quickly become outdated. Without an intentional design to hold the components together, these systems will become a technical liability, prone to errors and be quick to fall apart.
In this book, Chip Huyen provides a framework for designing real-world ML systems that are quick to deploy, reliable, scalable, and iterative. These systems have the capacity to learn from new data, improve on past mistakes, and adapt to changing requirements and environments. You�?�¢??ll learn everything from project scoping, data management, model development, deployment, and infrastructure to team structure and business analysis.
- Learn the challenges and requirements of an ML system in production
- Build training data with different sampling and labeling methods
- Leverage best techniques to engineer features for your ML models to avoid data leakage
- Select, develop, debug, and evaluate ML models that are best suit for your tasks
- Deploy different types of ML systems for different hardware
- Explore major infrastructural choices and hardware designs
- Understand the human side of ML, including integrating ML into business, user experience, and team structure
1. Machine Learning Systems in Production When to Use Machine Learning Machine Learning Use Cases Understanding Machine Learning Systems Mind vs. Data Machine learning in research vs. in production Machine learning systems vs. traditional software Designing ML Systems in Production Requirements for ML Systems Iterative Process Summary 2. Data Engineering Fundamentals Data Sources Data Formats JSON Row-major vs. Column-major Format Text vs. Binary Format Data Models Relational Model NoSQL Structured vs. Unstructured Data Data Storage Engines and Processing Transactional and Analytical Processing ETL: Extract, Transform, and Load Modes of Dataflow Data Passing Through Databases Data Passing Through Services Data Passing Through Real-time Transport Batch Processing vs. Stream Processing Summary 3. Training Data Sampling Non-Probability Sampling Simple Random Sampling Stratified Sampling Weighted Sampling Importance Sampling Reservoir Sampling Labeling Hand Labels Handling the Lack of Hand Labels Class Imbalance Challenges of Class Imbalance Handling Class Imbalance Data Augmentation Simple Label-Preserving Transformations Perturbation Data Synthesis Summary 4. Feature Engineering Learned Features vs. Engineered Features Common Feature Engineering Operations Handling Missing Values Scaling Discretization Encoding Categorical Features Feature Crossing Discrete and Continuous Positional Embeddings Data Leakage Common Causes for Data Leakage Detecting Data Leakage Engineering Good Features Feature Importance Feature Generalization Summary 5. Model Development Framing ML Problems Types of ML Tasks Objective Functions Model Development and Training Evaluating ML Models Ensembles Experiment Tracking and Versioning Distributed Training AutoML Model Offline Evaluation Baselines Evaluation Methods Summary 6. Model Deployment Machine Learning Deployment Myths Batch Prediction vs. Online Prediction From Batch Prediction To Online Prediction Unifying Batch Pipeline And Streaming Pipeline Model Compression Low-rank Factorization Knowledge Distillation Pruning Quantization ML on the Cloud and on the Edge Compiling and Optimizing Models for Edge Devices ML in Browsers Summary 7. Why Machine Learning Systems Fail in Production Natural Labels and Feedback Loop Causes of ML System Failures Production Data Differing From Training Data Edge Cases Degenerate Feedback Loop Data Distribution Shifts Types of Data Distribution Shifts General Data Distribution Shifts Handling Data Distribution Shifts Summary About the Author
How to download source code?
1. Go to: https://www.oreilly.com/
2. Search the book title: Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications
, sometime you may not get the results, please search the main title
3. Click the book title in the search results
3. Publisher resources
section, click Download Example Code
.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.