Kubeflow for Machine Learning: From Lab to Production

by Boris Lublinsky, Holden Karau, Ilan Filonenko, Richard Liu, Trevor Grant

Length: 264 pages
Edition: 1
Language: English
Publisher: O'Reilly Media
Publication Date: 2020-11-03
ISBN-10: 1492050121
ISBN-13: 9781492050124
Sales Rank: #5620449 (See Top 100 Books)

If you’re training a machine learning model but aren’t sure how to put it into production, this book will get you there. Kubeflow provides a collection of cloud native tools for different stages of a model’s lifecycle, from data exploration, feature preparation, and model training to model serving. This guide helps data scientists build production-grade machine learning implementations with Kubeflow and shows data engineers how to make models scalable and reliable.

Using examples throughout the book, authors Holden Karau, Trevor Grant, Ilan Filonenko, Richard Liu, and Boris Lublinsky explain how to use Kubeflow to train and serve your machine learning models on top of Kubernetes in the cloud or in a development environment on-premises.

Understand Kubeflow’s design, core components, and the problems it solves
Understand the differences between Kubeflow on different cluster types
Train models using Kubeflow with popular tools including Scikit-learn, TensorFlow, and Apache Spark
Keep your model up to date with Kubeflow Pipelines
Understand how to capture model training metadata
Explore how to extend Kubeflow with additional open source tools
Use hyperparameter tuning for training
Learn how to serve your model in production

Foreword
Preface
    Our Assumption About You
    Your Responsibility as a Practitioner
    Conventions Used in This Book
    Code Examples
        Using Code Examples
    O’Reilly Online Learning
    How to Contact the Authors
    How to Contact Us
    Acknowledgments
    Grievances
1. Kubeflow: What It Is and Who It Is For
    Model Development Life Cycle
    Where Does Kubeflow Fit In?
    Why Containerize?
    Why Kubernetes?
    Kubeflow’s Design and Core Components
        Data Exploration with Notebooks
        Data/Feature Preparation
        Training
        Hyperparameter Tuning
        Model Validation
        Inference/Prediction
        Pipelines
        Component Overview
    Alternatives to Kubeflow
        Clipper (RiseLabs)
        MLflow (Databricks)
        Others
    Introducing Our Case Studies
        Modified National Institute of Standards and Technology
        Mailing List Data
        Product Recommender
        CT Scans
    Conclusion
2. Hello Kubeflow
    Getting Set Up with Kubeflow
        Installing Kubeflow and Its Dependencies
        Setting Up Local Kubernetes
            Minikube
        Setting Up Your Kubeflow Development Environment
            Setting up the Pipeline SDK
            Setting up Docker
            Editing YAML
        Creating Our First Kubeflow Project
    Training and Deploying a Model
        Training and Monitoring Progress
        Test Query
    Going Beyond a Local Deployment
    Conclusion
3. Kubeflow Design: Beyond the Basics
    Getting Around the Central Dashboard
        Notebooks (JupyterHub)
        Training Operators
        Kubeflow Pipelines
        Hyperparameter Tuning
        Model Inference
        Metadata
        Component Summary
    Support Components
        MinIO
        Istio
        Knative
        Apache Spark
        Kubeflow Multiuser Isolation
    Conclusion
4. Kubeflow Pipelines
    Getting Started with Pipelines
        Exploring the Prepackaged Sample Pipelines
        Building a Simple Pipeline in Python
        Storing Data Between Steps
    Introduction to Kubeflow Pipelines Components
        Argo: the Foundation of Pipelines
        What Kubeflow Pipelines Adds to Argo Workflow
        Building a Pipeline Using Existing Images
        Kubeflow Pipeline Components
    Advanced Topics in Pipelines
        Conditional Execution of Pipeline Stages
        Running Pipelines on Schedule
    Conclusion
5. Data and Feature Preparation
    Deciding on the Correct Tooling
    Local Data and Feature Preparation
        Fetching the Data
        Data Cleaning: Filtering Out the Junk
        Formatting the Data
        Feature Preparation
        Custom Containers
    Distributed Tooling
        TensorFlow Extended
            Keeping your data quality: TensorFlow data validation
            TensorFlow Transform, with TensorFlow Extended on Beam
        Distributed Data Using Apache Spark
            Spark operators in Kubeflow
            Reading the input data
            Validating the schema
            Handling missing fields
            Filtering out bad data
            Saving the output
        Distributed Feature Preparation Using Apache Spark
    Putting It Together in a Pipeline
    Using an Entire Notebook as a Data Preparation  Pipeline Stage
    Conclusion
6. Artifact and Metadata Store
    Kubeflow ML Metadata
        Programmatic Query
        Kubeflow Metadata UI
    Using MLflow’s Metadata Tools with Kubeflow
        Creating and Deploying an MLflow Tracking Server
        Logging Data on Runs
        Using the MLflow UI
    Conclusion
7. Training a Machine Learning Model
    Building a Recommender with TensorFlow
        Getting Started
        Starting a New Notebook Session
        TensorFlow Training
    Deploying a TensorFlow Training Job
    Distributed Training
        Using GPUs
        Using Other Frameworks for Distributed Training
    Training a Model Using Scikit-Learn
        Starting a New Notebook Session
        Data Preparation
        Scikit-Learn Training
        Explaining the Model
        Exporting Model
        Integration into Pipelines
    Conclusion
8. Model Inference
    Model Serving
        Model Serving Requirements
    Model Monitoring
        Model Accuracy, Drift, and Explainability
        Model Monitoring Requirements
    Model Updating
        Model Updating Requirements
    Summary of Inference Requirements
    Model Inference in Kubeflow
    TensorFlow Serving
        Review
            Model serving
            Model monitoring
            Model updating
            Summary
    Seldon Core
        Designing a Seldon Inference Graph
            Setting up Seldon Core
            Packaging your model
            Creating a SeldonDeployment
        Testing Your Model
            Python client for Python language wrapped models
            Local testing with Docker
        Serving Requests
        Monitoring Your Models
            Model explainability
            Sentiment prediction model
            US Census income predictor model example
            Outlier and drift detection
        Review
            Model serving
            Model monitoring
            Model updating
            Summary
    KFServing
        Serverless and the Service Plane
        Data Plane
        Example Walkthrough
            Setting up KFServing
            Simplicity and extensibility
            Recommender example
        Peeling Back the Underlying Infrastructure
            Going layer by layer
            Escape hatches
            Debugging an InferenceService
            Debugging performance
            Knative Eventing
            Additional features
            API documentation
        Review
            Model serving
            Model monitoring
            Model updating
            Summary
    Conclusion
9. Case Study Using Multiple Tools
    The Denoising CT Scans Example
        Data Prep with Python
        DS-SVD with Apache Spark
        Visualization
            Downloading DRMs
            Recomposing the matrix into denoised images
        The CT Scan Denoising Pipeline
            Spark operation manifest
            The pipeline
    Sharing the Pipeline
    Conclusion
10. Hyperparameter Tuning and Automated  Machine Learning
    AutoML: An Overview
    Hyperparameter Tuning with Kubeflow Katib
    Katib Concepts
    Installing Katib
    Running Your First Katib Experiment
        Prepping Your Training Code
        Configuring an Experiment
        Running the Experiment
        Katib User Interface
    Tuning Distributed Training Jobs
    Neural Architecture Search
    Advantages of Katib over Other Frameworks
    Conclusion
A. Argo Executor Configurations and Trade-Offs
B. Cloud-Specific Tools and Configuration
    Google Cloud
        TPU-Accelerated Instances
        Dataflow for TFX
C. Using Model Serving in Applications
    Building Streaming Applications Leveraging  Model Serving
        Stream Processing Engines and Libraries
        Introducing Cloudflow
    Building Batch Applications Leveraging Model Serving
Index