Kubeflow for Machine Learning: From Lab to Production
- Length: 264 pages
- Edition: 1
- Language: English
- Publisher: O'Reilly Media
- Publication Date: 2020-11-03
- ISBN-10: 1492050121
- ISBN-13: 9781492050124
- Sales Rank: #5620449 (See Top 100 Books)
If you’re training a machine learning model but aren’t sure how to put it into production, this book will get you there. Kubeflow provides a collection of cloud native tools for different stages of a model’s lifecycle, from data exploration, feature preparation, and model training to model serving. This guide helps data scientists build production-grade machine learning implementations with Kubeflow and shows data engineers how to make models scalable and reliable.
Using examples throughout the book, authors Holden Karau, Trevor Grant, Ilan Filonenko, Richard Liu, and Boris Lublinsky explain how to use Kubeflow to train and serve your machine learning models on top of Kubernetes in the cloud or in a development environment on-premises.
- Understand Kubeflow’s design, core components, and the problems it solves
- Understand the differences between Kubeflow on different cluster types
- Train models using Kubeflow with popular tools including Scikit-learn, TensorFlow, and Apache Spark
- Keep your model up to date with Kubeflow Pipelines
- Understand how to capture model training metadata
- Explore how to extend Kubeflow with additional open source tools
- Use hyperparameter tuning for training
- Learn how to serve your model in production
Foreword Preface Our Assumption About You Your Responsibility as a Practitioner Conventions Used in This Book Code Examples Using Code Examples O’Reilly Online Learning How to Contact the Authors How to Contact Us Acknowledgments Grievances 1. Kubeflow: What It Is and Who It Is For Model Development Life Cycle Where Does Kubeflow Fit In? Why Containerize? Why Kubernetes? Kubeflow’s Design and Core Components Data Exploration with Notebooks Data/Feature Preparation Training Hyperparameter Tuning Model Validation Inference/Prediction Pipelines Component Overview Alternatives to Kubeflow Clipper (RiseLabs) MLflow (Databricks) Others Introducing Our Case Studies Modified National Institute of Standards and Technology Mailing List Data Product Recommender CT Scans Conclusion 2. Hello Kubeflow Getting Set Up with Kubeflow Installing Kubeflow and Its Dependencies Setting Up Local Kubernetes Minikube Setting Up Your Kubeflow Development Environment Setting up the Pipeline SDK Setting up Docker Editing YAML Creating Our First Kubeflow Project Training and Deploying a Model Training and Monitoring Progress Test Query Going Beyond a Local Deployment Conclusion 3. Kubeflow Design: Beyond the Basics Getting Around the Central Dashboard Notebooks (JupyterHub) Training Operators Kubeflow Pipelines Hyperparameter Tuning Model Inference Metadata Component Summary Support Components MinIO Istio Knative Apache Spark Kubeflow Multiuser Isolation Conclusion 4. Kubeflow Pipelines Getting Started with Pipelines Exploring the Prepackaged Sample Pipelines Building a Simple Pipeline in Python Storing Data Between Steps Introduction to Kubeflow Pipelines Components Argo: the Foundation of Pipelines What Kubeflow Pipelines Adds to Argo Workflow Building a Pipeline Using Existing Images Kubeflow Pipeline Components Advanced Topics in Pipelines Conditional Execution of Pipeline Stages Running Pipelines on Schedule Conclusion 5. Data and Feature Preparation Deciding on the Correct Tooling Local Data and Feature Preparation Fetching the Data Data Cleaning: Filtering Out the Junk Formatting the Data Feature Preparation Custom Containers Distributed Tooling TensorFlow Extended Keeping your data quality: TensorFlow data validation TensorFlow Transform, with TensorFlow Extended on Beam Distributed Data Using Apache Spark Spark operators in Kubeflow Reading the input data Validating the schema Handling missing fields Filtering out bad data Saving the output Distributed Feature Preparation Using Apache Spark Putting It Together in a Pipeline Using an Entire Notebook as a Data Preparation Pipeline Stage Conclusion 6. Artifact and Metadata Store Kubeflow ML Metadata Programmatic Query Kubeflow Metadata UI Using MLflow’s Metadata Tools with Kubeflow Creating and Deploying an MLflow Tracking Server Logging Data on Runs Using the MLflow UI Conclusion 7. Training a Machine Learning Model Building a Recommender with TensorFlow Getting Started Starting a New Notebook Session TensorFlow Training Deploying a TensorFlow Training Job Distributed Training Using GPUs Using Other Frameworks for Distributed Training Training a Model Using Scikit-Learn Starting a New Notebook Session Data Preparation Scikit-Learn Training Explaining the Model Exporting Model Integration into Pipelines Conclusion 8. Model Inference Model Serving Model Serving Requirements Model Monitoring Model Accuracy, Drift, and Explainability Model Monitoring Requirements Model Updating Model Updating Requirements Summary of Inference Requirements Model Inference in Kubeflow TensorFlow Serving Review Model serving Model monitoring Model updating Summary Seldon Core Designing a Seldon Inference Graph Setting up Seldon Core Packaging your model Creating a SeldonDeployment Testing Your Model Python client for Python language wrapped models Local testing with Docker Serving Requests Monitoring Your Models Model explainability Sentiment prediction model US Census income predictor model example Outlier and drift detection Review Model serving Model monitoring Model updating Summary KFServing Serverless and the Service Plane Data Plane Example Walkthrough Setting up KFServing Simplicity and extensibility Recommender example Peeling Back the Underlying Infrastructure Going layer by layer Escape hatches Debugging an InferenceService Debugging performance Knative Eventing Additional features API documentation Review Model serving Model monitoring Model updating Summary Conclusion 9. Case Study Using Multiple Tools The Denoising CT Scans Example Data Prep with Python DS-SVD with Apache Spark Visualization Downloading DRMs Recomposing the matrix into denoised images The CT Scan Denoising Pipeline Spark operation manifest The pipeline Sharing the Pipeline Conclusion 10. Hyperparameter Tuning and Automated Machine Learning AutoML: An Overview Hyperparameter Tuning with Kubeflow Katib Katib Concepts Installing Katib Running Your First Katib Experiment Prepping Your Training Code Configuring an Experiment Running the Experiment Katib User Interface Tuning Distributed Training Jobs Neural Architecture Search Advantages of Katib over Other Frameworks Conclusion A. Argo Executor Configurations and Trade-Offs B. Cloud-Specific Tools and Configuration Google Cloud TPU-Accelerated Instances Dataflow for TFX C. Using Model Serving in Applications Building Streaming Applications Leveraging Model Serving Stream Processing Engines and Libraries Introducing Cloudflow Building Batch Applications Leveraging Model Serving Index
Donate to keep this site alive
How to download source code?
1. Go to: https://www.oreilly.com/
2. Search the book title: Kubeflow for Machine Learning: From Lab to Production
, sometime you may not get the results, please search the main title
3. Click the book title in the search results
3. Publisher resources
section, click Download Example Code
.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.