Machine Learning Engineering on AWS: Build, scale, and secure machine learning systems and MLOps pipelines in production
- Length: 530 pages
- Edition: 1
- Language: English
- Publisher: Packt Publishing
- Publication Date: 2022-10-27
- ISBN-10: 1803247592
- ISBN-13: 9781803247595
- Sales Rank: #318003 (See Top 100 Books)
Work seamlessly with production-ready machine learning systems and pipelines on AWS by addressing key pain points encountered in the ML life cycle
Key Features
- Gain practical knowledge of managing ML workloads on AWS using Amazon SageMaker, Amazon EKS, and more
- Use container and serverless services to solve a variety of ML engineering requirements
- Design, build, and secure automated MLOps pipelines and workflows on AWS
Book Description
There is a growing need for professionals with experience in working on machine learning (ML) engineering requirements as well as those with knowledge of automating complex MLOps pipelines in the cloud. This book explores a variety of AWS services, such as Amazon Elastic Kubernetes Service, AWS Glue, AWS Lambda, Amazon Redshift, and AWS Lake Formation, which ML practitioners can leverage to meet various data engineering and ML engineering requirements in production.
This machine learning book covers the essential concepts as well as step-by-step instructions that are designed to help you get a solid understanding of how to manage and secure ML workloads in the cloud. As you progress through the chapters, you’ll discover how to use several container and serverless solutions when training and deploying TensorFlow and PyTorch deep learning models on AWS. You’ll also delve into proven cost optimization techniques as well as data privacy and model privacy preservation strategies in detail as you explore best practices when using each AWS.
By the end of this AWS book, you’ll be able to build, scale, and secure your own ML systems and pipelines, which will give you the experience and confidence needed to architect custom solutions using a variety of AWS services for ML engineering requirements.
What you will learn
- Find out how to train and deploy TensorFlow and PyTorch models on AWS
- Use containers and serverless services for ML engineering requirements
- Discover how to set up a serverless data warehouse and data lake on AWS
- Build automated end-to-end MLOps pipelines using a variety of services
- Use AWS Glue DataBrew and SageMaker Data Wrangler for data engineering
- Explore different solutions for deploying deep learning models on AWS
- Apply cost optimization techniques to ML environments and systems
- Preserve data privacy and model privacy using a variety of techniques
Who this book is for
This book is for machine learning engineers, data scientists, and AWS cloud engineers interested in working on production data engineering, machine learning engineering, and MLOps requirements using a variety of AWS services such as Amazon EC2, Amazon Elastic Kubernetes Service (EKS), Amazon SageMaker, AWS Glue, Amazon Redshift, AWS Lake Formation, and AWS Lambda — all you need is an AWS account to get started. Prior knowledge of AWS, machine learning, and the Python programming language will help you to grasp the concepts covered in this book more effectively.
Machine Learning Engineering on AWS Copyright © 2022 Packt Publishing Contributors About the author About the reviewers Preface Who this book is for What this book covers To get the most out of this book Download the example code files Download the color images Conventions used Get in touch Share Your Thoughts Part 1: Getting Started with Machine Learning Engineering on AWS Chapter 1: Introduction to ML Engineering on AWS Technical requirements What is expected from ML engineers? How ML engineers can get the most out of AWS Essential prerequisites Creating the Cloud9 environment Increasing Cloud9’s storage Installing the Python prerequisites Preparing the dataset Generating a synthetic dataset using a deep learning model Exploratory data analysis Train-test split Uploading the dataset to Amazon S3 AutoML with AutoGluon Setting up and installing AutoGluon Performing your first AutoGluon AutoML experiment Getting started with SageMaker and SageMaker Studio Onboarding with SageMaker Studio Adding a user to an existing SageMaker Domain No-code machine learning with SageMaker Canvas AutoML with SageMaker Autopilot Summary Further reading Chapter 2: Deep Learning AMIs Technical requirements Getting started with Deep Learning AMIs Launching an EC2 instance using a Deep Learning AMI Locating the framework-specific DLAMI Choosing the instance type Ensuring a default secure configuration Launching the instance and connecting to it using EC2 Instance Connect Downloading the sample dataset Training an ML model Loading and evaluating the model Cleaning up Understanding how AWS pricing works for EC2 instances Using multiple smaller instances to reduce the overall cost of running ML workloads Using spot instances to reduce the cost of running training jobs Summary Further reading Chapter 3: Deep Learning Containers Technical requirements Getting started with AWS Deep Learning Containers Essential prerequisites Preparing the Cloud9 environment Downloading the sample dataset Using AWS Deep Learning Containers to train an ML model Serverless ML deployment with Lambda’s container image support Building the custom container image Testing the container image Pushing the container image to Amazon ECR Running ML predictions on AWS Lambda Completing and testing the serverless API setup Summary Further reading Part 2:Solving Data Engineering and Analysis Requirements Chapter 4: Serverless Data Management on AWS Technical requirements Getting started with serverless data management Preparing the essential prerequisites Opening a text editor on your local machine Creating an IAM user Creating a new VPC Uploading the dataset to S3 Running analytics at scale with Amazon Redshift Serverless Setting up a Redshift Serverless endpoint Opening Redshift query editor v2 Creating a table Loading data from S3 Querying the database Unloading data to S3 Setting up Lake Formation Creating a database Creating a table using an AWS Glue Crawler Using Amazon Athena to query data in Amazon S3 Setting up the query result location Running SQL queries using Athena Summary Further reading Chapter 5: Pragmatic Data Processing and Analysis Technical requirements Getting started with data processing and analysis Preparing the essential prerequisites Downloading the Parquet file Preparing the S3 bucket Automating data preparation and analysis with AWS Glue DataBrew Creating a new dataset Creating and running a profile job Creating a project and configuring a recipe Creating and running a recipe job Verifying the results Preparing ML data with Amazon SageMaker Data Wrangler Accessing Data Wrangler Importing data Transforming the data Analyzing the data Exporting the data flow Turning off the resources Verifying the results Summary Further reading Part 3: Diving Deeper with Relevant Model Training and Deployment Solutions Chapter 6: SageMaker Training and Debugging Solutions Technical requirements Getting started with the SageMaker Python SDK Preparing the essential prerequisites Creating a service limit increase request Training an image classification model with the SageMaker Python SDK Creating a new Notebook in SageMaker Studio Downloading the training, validation, and test datasets Uploading the data to S3 Using the SageMaker Python SDK to train an ML model Using the %store magic to store data Using the SageMaker Python SDK to deploy an ML model Using the Debugger Insights Dashboard Utilizing Managed Spot Training and Checkpoints Cleaning up Summary Further reading Chapter 7: SageMaker Deployment Solutions Technical requirements Getting started with model deployments in SageMaker Preparing the pre-trained model artifacts Preparing the SageMaker script mode prerequisites Preparing the inference.py file Preparing the requirements.txt file Preparing the setup.py file Deploying a pre-trained model to a real-time inference endpoint Deploying a pre-trained model to a serverless inference endpoint Deploying a pre-trained model to an asynchronous inference endpoint Creating the input JSON file Adding an artificial delay to the inference script Deploying and testing an asynchronous inference endpoint Cleaning up Deployment strategies and best practices Summary Further reading Part 4:Securing, Monitoring, and Managing Machine Learning Systems and Environments Chapter 8: Model Monitoring and Management Solutions Technical prerequisites Registering models to SageMaker Model Registry Creating a new notebook in SageMaker Studio Registering models to SageMaker Model Registry using the boto3 library Deploying models from SageMaker Model Registry Enabling data capture and simulating predictions Scheduled monitoring with SageMaker Model Monitor Analyzing the captured data Deleting an endpoint with a monitoring schedule Cleaning up Summary Further reading Chapter 9: Security, Governance, and Compliance Strategies Managing the security and compliance of ML environments Authentication and authorization Network security Encryption at rest and in transit Managing compliance reports Vulnerability management Preserving data privacy and model privacy Federated Learning Differential Privacy Privacy-preserving machine learning Other solutions and options Establishing ML governance Lineage Tracking and reproducibility Model inventory Model validation ML explainability Bias detection Model monitoring Traceability, observability, and auditing Data quality analysis and reporting Data integrity management Summary Further reading Part 5:Designing and Building End-to-end MLOps Pipelines Chapter 10: Machine Learning Pipelines with Kubeflow on Amazon EKS Technical requirements Diving deeper into Kubeflow, Kubernetes, and EKS Preparing the essential prerequisites Preparing the IAM role for the EC2 instance of the Cloud9 environment Attaching the IAM role to the EC2 instance of the Cloud9 environment Updating the Cloud9 environment with the essential prerequisites Setting up Kubeflow on Amazon EKS Running our first Kubeflow pipeline Using the Kubeflow Pipelines SDK to build ML workflows Cleaning up Recommended strategies and best practices Summary Further reading Chapter 11: Machine Learning Pipelines with SageMaker Pipelines Technical requirements Diving deeper into SageMaker Pipelines Preparing the essential prerequisites Running our first pipeline with SageMaker Pipelines Defining and preparing our first ML pipeline Running our first ML pipeline Creating Lambda functions for deployment Preparing the Lambda function for deploying a model to a new endpoint Preparing the Lambda function for checking whether an endpoint exists Preparing the Lambda function for deploying a model to an existing endpoint Testing our ML inference endpoint Completing the end-to-end ML pipeline Defining and preparing the complete ML pipeline Running the complete ML pipeline Cleaning up Recommended strategies and best practices Summary Further reading Index Why subscribe? Other Books You May Enjoy Packt is searching for authors like you Share Your Thoughts
Donate to keep this site alive
How to download source code?
1. Go to: https://github.com/PacktPublishing
2. In the Find a repository… box, search the book title: Machine Learning Engineering on AWS: Build, scale, and secure machine learning systems and MLOps pipelines in production
, sometime you may not get the results, please search the main title.
3. Click the book title in the search results.
3. Click Code to download.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.