Amazon SageMaker Best Practices: Proven tips and tricks to build successful machine learning solutions on Amazon SageMaker

by Randy DeFauw, Shelbee Eigenbrode, Sireesha Muppala

Length: 348 pages
Edition: 1
Language: English
Publisher: Packt Publishing
Publication Date: 2021-09-24
ISBN-10: 1801070520
ISBN-13: 9781801070522
Sales Rank: #52418 (See Top 100 Books)

Overcome advanced challenges in building end-to-end ML solutions by leveraging the capabilities of Amazon SageMaker for developing and integrating ML models into production

Key Features

Learn best practices for all phases of building machine learning solutions – from data preparation to monitoring models in production
Automate end-to-end machine learning workflows with Amazon SageMaker and related AWS
Design, architect, and operate machine learning workloads in the AWS Cloud

Book Description

Amazon SageMaker is a fully managed AWS service that provides the ability to build, train, deploy, and monitor machine learning models. The book begins with a high-level overview of Amazon SageMaker capabilities that map to the various phases of the machine learning process to help set the right foundation. You’ll learn efficient tactics to address data science challenges such as processing data at scale, data preparation, connecting to big data pipelines, identifying data bias, running A/B tests, and model explainability using Amazon SageMaker. As you advance, you’ll understand how you can tackle the challenge of training at scale, including how to use large data sets while saving costs, monitoring training resources to identify bottlenecks, speeding up long training jobs, and tracking multiple models trained for a common goal. Moving ahead, you’ll find out how you can integrate Amazon SageMaker with other AWS to build reliable, cost-optimized, and automated machine learning applications. In addition to this, you’ll build ML pipelines integrated with MLOps principles and apply best practices to build secure and performant solutions.

By the end of the book, you’ll confidently be able to apply Amazon SageMaker’s wide range of capabilities to the full spectrum of machine learning workflows.

What you will learn

Perform data bias detection with AWS Data Wrangler and SageMaker Clarify
Speed up data processing with SageMaker Feature Store
Overcome labeling bias with SageMaker Ground Truth
Improve training time with the monitoring and profiling capabilities of SageMaker Debugger
Address the challenge of model deployment automation with CI/CD using the SageMaker model registry
Explore SageMaker Neo for model optimization
Implement data and model quality monitoring with Amazon Model Monitor
Improve training time and reduce costs with SageMaker data and model parallelism

Who this book is for

This book is for expert data scientists responsible for building machine learning applications using Amazon SageMaker. Working knowledge of Amazon SageMaker, machine learning, deep learning, and experience using Jupyter Notebooks and Python is expected. Basic knowledge of AWS related to data, security, and monitoring will help you make the most of the book.

Amazon SageMaker Overview
Data Science Environments
Data Labeling with Amazon SageMaker Ground Truth
Data Preparation at Scale Using Amazon SageMaker Data Wrangler and Processing
Centralized Feature Repository with Amazon SageMaker Feature Store
Training and Tuning at Scale
Profile Training Jobs with Amazon SageMaker Debugger
Managing Models at Scale Using a Model Registry
Updating Production Models Using Amazon SageMaker Endpoint Production Variants
Optimizing Model Hosting and Inference Costs
Monitoring Production Models with Amazon SageMaker Model Monitor and Clarify
Machine Learning Automated Workflows
Well-Architected Machine Learning with Amazon SageMaker
Managing SageMaker Features Across Accounts

Amazon SageMaker Best Practices
Contributors
About the authors
About the reviewers
Preface
    Who this book is for
    What this book covers
    To get the most out of this book
    Download the example code files
    Download the color images
    Conventions used
    Get in touch
    Share your thoughts
Section 1: Processing Data at Scale
Chapter 1: Amazon SageMaker Overview
    Technical requirements
    Preparing, building, training and tuning, deploying, and managing ML models
    Discussion of data preparation capabilities
        SageMaker Ground Truth
        SageMaker Data Wrangler
        SageMaker Processing
        SageMaker Feature Store
        SageMaker Clarify
    Feature tour of model-building capabilities
        SageMaker Studio
        SageMaker notebook instances
        SageMaker algorithms
        BYO algorithms and scripts
    Feature tour of training and tuning capabilities
        SageMaker training jobs
        Autopilot
        HPO
        SageMaker Debugger
        SageMaker Experiments
    Feature tour of model management and deployment capabilities
        Model Monitor 
        Model endpoints
        Edge Manager
    Summary
Chapter 2: Data Science Environments
    Technical requirements
    Machine learning use case and dataset
    Creating data science environment
        Creating repeatability through IaC/CaC
        Amazon SageMaker notebook instances
        Amazon SageMaker Studio
        Providing and creating data science environments as IT services
        Creating a portfolio in AWS Service Catalog 
        Amazon SageMaker notebook instances
        Amazon SageMaker Studio 
    Summary
    References 
Chapter 3: Data Labeling with Amazon SageMaker Ground Truth
    Technical requirements
    Challenges with labeling data at scale
    Addressing unique labeling requirements with custom labeling workflows
        A private labeling workforce
        Listing the data to label
        Creating the workflow
    Improving labeling quality using multiple workers
    Using active learning to reduce labeling time 
    Security and permissions
    Summary
Chapter 4: Data Preparation at Scale Using Amazon SageMaker Data Wrangler and Processing
    Technical requirements
    Visual data preparation with Data Wrangler
        Data inspection
        Data transformation
        Exporting the flow
    Bias detection and explainability with Data Wrangler and Clarify
    Data preparation at scale with SageMaker Processing
        Loading the dataset
        Drop columns
        Converting data types
        Scaling numeric fields
        Featurizing the date
        Simulating labels for air quality
        Encoding categorical variables
        Splitting and saving the dataset
    Summary
Chapter 5: Centralized Feature Repository with Amazon SageMaker Feature Store
    Technical requirements
    Amazon SageMaker Feature Store essentials 
    Creating feature groups
    Populating feature groups
    Retrieving features from feature groups
    Creating reusable features to reduce feature inconsistencies and inference latency
    Designing solutions for near real-time ML predictions
    Summary
    References 
Section 2: Model Training Challenges
Chapter 6: Training and Tuning at Scale
    Technical requirements
    ML training at scale with SageMaker distributed libraries
        Choosing between data and model parallelism
        Scaling the compute resources
        SageMaker distributed libraries
    Automated model tuning with SageMaker hyperparameter tuning
    Organizing and tracking training jobs with SageMaker Experiments
    Summary
    References
Chapter 7: Profile Training Jobs with Amazon SageMaker Debugger
    Technical requirements
    Amazon SageMaker Debugger essentials
        Configuring a training job to use SageMaker Debugger
        Analyzing the collected tensors and metrics
        Taking action
    Real-time monitoring of training jobs using built-in and custom rules
    Gaining insight into the training infrastructure and training framework
        Training a PyTorch model for weather prediction 
        Analyzing and visualizing the system and framework metrics generated by the profiler
        Analyzing the profiler report generated by SageMaker Debugger
        Analyzing and implementing recommendations from the profiler report
        Comparing the two training jobs
    Summary
    Further reading
Section 3: Manage and Monitor Models
Chapter 8: Managing Models at Scale Using a Model Registry
    Technical requirements 
    Using a model registry
    Choosing a model registry solution
        Amazon SageMaker model registry
        Building a custom model registry
        Utilizing a third-party or OSS model registry
    Managing models using the Amazon SageMaker model registry
        Creating a model package group
        Creating a model package
    Summary
Chapter 9: Updating Production Models Using Amazon SageMaker Endpoint Production Variants
    Technical requirements
    Basic concepts of Amazon SageMaker Endpoint Production Variants
    Deployment strategies for updating ML models with SageMaker Endpoint Production Variants
        Standard deployment
        A/B deployment
        Blue/Green deployment
        Canary deployment
        Shadow deployment
    Selecting an appropriate deployment strategy
        Selecting a standard deployment
        Selecting an A/B deployment
        Selecting a Blue/Green deployment
        Selecting a Canary deployment
        Selecting a Shadow deployment
    Summary
Chapter 10: Optimizing Model Hosting and Inference Costs
    Technical requirements
    Real-time inference versus batch inference
        Batch inference
        Real-time inference
        Cost comparison
    Deploying multiple models behind a single inference endpoint
        Multiple versions of the same model
        Multiple models
    Scaling inference endpoints to meet inference traffic demands
        Setting the minimum and maximum capacity
        Choosing a scaling metric
        Setting the scaling policy
        Setting the cooldown period 
    Using Elastic Inference for deep learning models
    Optimizing models with SageMaker Neo
    Summary
Chapter 11: Monitoring Production Models with Amazon SageMaker Model Monitor and Clarify
    Technical requirements
    Basic concepts of Amazon SageMaker Model Monitor and Amazon SageMaker Clarify
    End-to-end architectures for monitoring ML models
        Data drift monitoring
        Model quality drift monitoring
        Bias drift monitoring
        Feature attribution drift monitoring
    Best practices for monitoring ML models
    Summary
    References
Section 4: Automate and Operationalize Machine Learning
Chapter 12: Machine Learning Automated Workflows
    Considerations for automating your SageMaker ML workflows
        Typical ML workflows
        Considerations and guidance for building SageMaker workflows and CI/CD pipelines
        AWS-native options for automated workflow and CI/CD pipelines
    Building ML workflows with Amazon SageMaker Pipelines
        Building your SageMaker pipeline 
        Data preparation step
        Model build step
        Model evaluation step
        Conditional step
        Register model step(s)
        Creating the pipeline
        Executing the pipeline
        Pipeline recommended practices
    Creating CI/CD pipelines using Amazon SageMaker Projects
        SageMaker projects recommended practices
    Summary
Chapter 13:Well-Architected Machine Learning with Amazon SageMaker
    Best practices for operationalizing ML workloads
        Ensuring reproducibility 
        Tracking ML artifacts
        Automating deployment pipelines
        Monitoring production models 
    Best practices for securing ML workloads
        Isolating the ML environment
        Disabling internet and root access 
        Enforcing authentication and authorization 
        Securing data and model artifacts 
        Logging, monitoring, and auditing 
        Meeting regulatory requirements 
    Best practices for reliable ML workloads
        Recovering from failure 
        Tracking model origin 
        Automating deployment pipelines 
        Handling unexpected traffic patterns
        Continuous monitoring of deployed model 
        Updating model with new versions 
    Best practices for building performant ML workloads
        Rightsizing ML resources 
        Monitoring resource utilization 
        Rightsizing hosting infrastructure 
        Continuous monitoring of deployed model
    Best practices for cost-optimized ML workloads
        Optimizing data labeling costs 
        Reducing experimentation costs with models from AWS Marketplace 
        Using AutoML to reduce experimentation time 
        Iterating locally with small datasets 
        Rightsizing training infrastructure 
        Optimizing hyperparameter-tuning costs 
        Saving training costs with Managed Spot Training
        Using insights and recommendations from Debugger 
        Saving ML infrastructure costs with SavingsPlan 
        Optimizing inference costs
        Stopping or terminating resources 
    Summary
Chapter 14: Managing SageMaker Features across Accounts
    Examining an overview of the AWS multi-account environment
    Understanding the benefits of using multiple AWS accounts with Amazon SageMaker
    Examining multi-account considerations with Amazon SageMaker
        Considerations for SageMaker features
    Summary
    References
    Why subscribe?
Other Books You May Enjoy
    Packt is searching for authors like you
    Share your thoughts

AI & Machine Learning Artificial Intelligence Data Modeling & Design Enterprise Applications Intelligence & Semantics

Donate to keep this site alive

To access the Link, solve the captcha.

How to download source code?

1. Go to: https://github.com/PacktPublishing

2. In the Find a repository… box, search the book title: Amazon SageMaker Best Practices: Proven tips and tricks to build successful machine learning solutions on Amazon SageMaker, sometime you may not get the results, please search the main title.