Applied Machine Learning and High Performance Computing on AWS: Accelerate development of machine learning applications following architectural best practices

by Farooq Sabir, Mani Khanuja, Shreyas Subramanian, Trenton Potgieter

Length: 398 pages
Edition: 1
Language: English
Publisher: Packt Publishing
Publication Date: 2022-12-09
ISBN-10: 1803237015
ISBN-13: 9781803237015
Sales Rank: #12008312 (See Top 100 Books)

Build, train, and deploy large machine learning models at scale in various domains such as computational fluid dynamics, genomics, autonomous vehicles, and numerical optimization using Amazon SageMaker.

Key Features

Understanding the need for High Performance Computing (HPC).
Build, train, and deploy large ML models with billions of parameters using Amazon SageMaker.
Best practices and architectures for implementing ML at scale using HPC.

Book Description

Machine Learning (ML) and High Performance Computing (HPC) on AWS run compute intensive workloads across industries and emerging applications. It’s use cases can be linked to various verticals like computational fluid dynamics (CFD), genomics, and autonomous vehicles.

The book provides end-to-end guidance starting from HPC concepts for storage and networking. It then goes deeper into part 2, with working examples on how to process large datasets using SageMaker Studio and EMR, build, train, and deploy large models using distributed training. It also covers deploying models to edge devices using SageMaker and IoT Greengrass, and performance optimization of ML models, for low latency use cases.

By the end of this book, you will be able to build, train, and deploy your own large scale ML application, using HPC on AWS, following the industry best practices and addressing the key pain points encountered in the application life cycle.

What you will learn

Data management, storage, and fast networking for HPC applications
Analysis and visualization of a large volume of data using Spark
Train visual transformer model using SageMaker distributed training
Deploy and manage ML models at scale on cloud and at edge
Performance optimization of ML models for low latency workloads
Apply HPC to industry domains like CFD, genomics, AV, and optimization

Who This Book Is For

The book begins with HPC concepts, however, expects you to have prior machine learning knowledge. This book is for ML engineers and Data Scientists, interested in learning advanced topics on using large dataset for training large models using distributed training concepts on AWS, followed by deploying models at scale and performance optimization for low latency use cases. This book is also beneficial for Practitioners in fields such as numerical optimization, computation fluid dynamics, autonomous vehicles, and genomics, who require HPC for applying ML models to applications at scale.

Applied Machine Learning and High-Performance Computing on AWS
Contributors
About the authors
About the reviewers
Preface
    Who this book is for
    What this book covers
    To get the most out of this book
    Download the example code files
    Download the color images
    Conventions used
    Get in touch
    Share Your Thoughts
    Download a free PDF copy of this book
Part 1: Introducing High-Performance Computing
Chapter 1: High-Performance Computing Fundamentals
    Why do we need HPC?
    Limitations of on-premises HPC
        Barrier to innovation
        Reduced efficiency
        Lost opportunities
        Limited scalability and elasticity
    Benefits of doing HPC on the cloud
        Drives innovation
        Enables secure collaboration among distributed teams
        Amplifies operational efficiency
        Optimizes performance
        Optimizes cost
    Driving innovation across industries with HPC
        Life sciences and healthcare
        AVs
        Supply chain optimization
    Summary
    Further reading
Chapter 2: Data Management and Transfer
    Importance of data management
    Challenges of moving data into the cloud
    How to securely transfer large amounts of data into the cloud
    AWS online data transfer services
        AWS DataSync
        AWS Transfer Family
        Amazon S3 Transfer Acceleration
        Amazon Kinesis
        AWS Snowcone
    AWS offline data transfer services
        Process for ordering a device from AWS Snow Family
    Summary
    Further reading
Chapter 3: Compute and Networking
    Introducing the AWS compute ecosystem
        General purpose instances
        Compute optimized instances
        Accelerated compute instances
        Memory optimized instances
        Storage optimized instances
        Amazon Machine Images (AMIs)
        Containers on AWS
        Serverless compute on AWS
    Networking on AWS
        CIDR blocks and routing
        Networking for HPC workloads
    Selecting the right compute for HPC workloads
        Pattern 1 – a standalone instance
        Pattern 2 – using AWS ParallelCluster
        Pattern 3 – using AWS Batch
        Pattern 4 – hybrid architecture
        Pattern 5 – Container-based distributed processing
        Pattern 6 – serverless architecture
    Best practices for HPC workloads
    Summary
    References
Chapter 4: Data Storage
    Technical requirements
    AWS services for storing data
        Amazon Simple Storage Service (S3)
        Amazon Elastic File System (EFS)
        Amazon EBS
        Amazon FSx
    Data security and governance
        IAM
        Data protection
        Data encryption
        Logging and monitoring
        Resilience
    Tiered storage for cost optimization
        Amazon S3 storage classes
        Amazon EFS storage classes
    Choosing the right storage option for HPC workloads
    Summary
    Further reading
Part 2: Applied Modeling
Chapter 5: Data Analysis
    Technical requirements
    Exploring data analysis methods
        Gathering the data
        Understanding the data structure
        Describing the data
        Visualizing the data
        Reviewing the data analytics life cycle
    Reviewing the AWS services for data analysis
        Unifying the data into a common store
        Creating a data structure for analysis
        Visualizing the data at scale
        Choosing the right AWS service
    Analyzing large amounts of structured and unstructured data
        Setting up EMR and SageMaker Studio
        Analyzing large amounts of structured data
        Analyzing large amounts of unstructured data
    Processing data at scale on AWS
    Cleaning up
    Summary
Chapter 6: Distributed Training of Machine Learning Models
    Technical requirements
    Building ML systems using AWS
    Introducing the fundamentals of distributed training
        Reviewing the SageMaker distributed data parallel strategy
        Reviewing the SageMaker model data parallel strategy
        Reviewing a hybrid data parallel and model parallel strategy
    Executing a distributed training workload on AWS
        Executing distributed data parallel training on Amazon SageMaker
        Executing distributed model parallel training on Amazon SageMaker
    Summary
Chapter 7: Deploying Machine Learning Models at Scale
    Managed deployment on AWS
        Amazon SageMaker managed model deployment options
        The variety of compute resources available
        Cost-effective model deployment
        Blue/green deployments
        Inference recommender
        MLOps integration
        Model registry
        Elastic inference
        Deployment on edge devices
    Choosing the right deployment option
        Using batch inference
        Using real-time endpoints
        Using asynchronous inference
    Batch inference
        Creating a transformer object
        Creating a batch transform job for carrying out inference
        Optimizing a batch transform job
    Real-time inference
        Hosting a machine learning model as a real-time endpoint
    Asynchronous inference
    The high availability of model endpoints
        Deployment on multiple instances
        Endpoints autoscaling
        Endpoint modification without disruption
    Blue/green deployments
        All at once
        Canary
        Linear
    Summary
    References
Chapter 8: Optimizing and Managing Machine Learning Models for Edge Deployment
    Technical requirements
    Understanding edge computing
    Reviewing the key considerations for optimal edge deployments
        Efficiency
        Performance
        Reliability
        Security
    Designing an architecture for optimal edge deployments
        Building the edge components
        Building the ML model
        Deploying the model package
    Summary
Chapter 9: Performance Optimization for Real-Time Inference
    Technical requirements
    Reducing the memory footprint of DL models
        Pruning
        Quantization
        Model compilation
    Key metrics for optimizing models
    Choosing the instance type, load testing, and performance tuning for models
    Observing the results
    Summary
Chapter 10: Data Visualization
    Data visualization using Amazon SageMaker Data Wrangler
        SageMaker Data Wrangler visualization options
        Adding visualizations to the data flow in SageMaker Data Wrangler
        Data flow
    Amazon’s graphics-optimized instances
        Benefits and key features of Amazon’s graphics-optimized instances
    Summary
    Further reading
Part 3: Driving Innovation Across Industries
Chapter 11: Computational Fluid Dynamics
    Technical requirements
    Introducing CFD
    Reviewing best practices for running CFD on AWS
        Using AWS ParallelCluster
        Using CFD Direct
    Discussing how ML can be applied to CFD
    Summary
    References
Chapter 12: Genomics
    Technical requirements
    Managing large genomics data on AWS
    Designing architecture for genomics
    Applying ML to genomics
        Protein secondary structure prediction for protein sequences
    Summary
Chapter 13: Autonomous Vehicles
    Technical requirements
    Introducing AV systems
    AWS services supporting AV systems
    Designing an architecture for AV systems
    ML applied to AV systems
        Model development
        Step 1 – build and push the CARLA container to Amazon ECR
        Step 2 – configure and run CARLA on RoboMaker
    Summary
    References
Chapter 14: Numerical Optimization
    Introduction to optimization
        Goal or objective function
        Variables
        Constraints
        Modeling an optimization problem
        Optimization algorithm
        Local and global optima
    Common numerical optimization algorithms
        Random restart hill climbing
        Simulated annealing
        Tabu search
        Evolutionary methods
    Example use cases of large-scale numerical optimization problems
        Traveling salesperson optimization problem
        Worker dispatch optimization
        Assembly line optimization
    Numerical optimization using high-performance compute on AWS
        Commercial optimization solvers
        Open source optimization solvers
        Numerical optimization patterns on AWS
    Machine learning and numerical optimization
    Summary
    Further reading
Index
    Why subscribe?
Other Books You May Enjoy
    Packt is searching for authors like you
    Share Your Thoughts
    Download a free PDF copy of this book

AI & Machine Learning Artificial Intelligence Data Processing Intelligence & Semantics

Donate to keep this site alive

To access the Link, solve the captcha.

How to download source code?

1. Go to: https://github.com/PacktPublishing

2. In the Find a repository… box, search the book title: Applied Machine Learning and High Performance Computing on AWS: Accelerate development of machine learning applications following architectural best practices, sometime you may not get the results, please search the main title.