Learning Ray: Flexible Distributed Python for Machine Learning
- Length: 271 pages
- Edition: 1
- Language: English
- Publisher: O'Reilly Media
- Publication Date: 2023-03-28
- ISBN-10: 1098117220
- ISBN-13: 9781098117221
- Sales Rank: #1233034 (See Top 100 Books)
Get started with Ray, the open source distributed computing framework that simplifies the process of scaling compute-intensive Python workloads. With this practical book, Python programmers, data engineers, and data scientists will learn how to leverage Ray locally and spin up compute clusters. You’ll be able to use Ray to structure and run machine learning programs at scale.
Authors Max Pumperla, Edward Oakes, and Richard Liaw show you how to build machine learning applications with Ray. You’ll understand how Ray fits into the current landscape of machine learning tools and discover how Ray continues to integrate ever more tightly with these tools. Distributed computation is hard, but by using Ray you’ll find it easy to get started.
- Learn how to build your first distributed applications with Ray Core
- Conduct hyperparameter optimization with Ray Tune
- Use the Ray RLlib library for reinforcement learning
- Manage distributed training with the Ray Train library
- Use Ray to perform data processing with Ray Datasets
- Learn how work with Ray Clusters and serve models with Ray Serve
- Build end-to-end machine learning applications with Ray AIR
Foreword Preface Who Should Read This Book Goals of This Book Navigating This Book How to Use the Code Examples Conventions Used in This Book Using Code Examples O’Reilly Online Learning How to Contact Us Acknowledgments 1. An Overview of Ray What Is Ray? What Led to Ray? Ray’s Design Principles Simplicity and abstraction Flexibility and heterogeneity Speed and scalability Three Layers: Core, Libraries, and Ecosystem A Distributed Computing Framework A Suite of Data Science Libraries Ray AIR and the Data Science Workflow Data Processing with Ray Datasets Model Training Reinforcement learning with Ray RLlib Distributed training with Ray Train Hyperparameter Tuning Model Serving A Growing Ecosystem Summary 2. Getting Started with Ray Core An Introduction to Ray Core A First Example Using the Ray API Functions and remote Ray tasks Using the object store with put and get Using Ray’s wait function for nonblocking calls Handling task dependencies From classes to actors An Overview of the Ray Core API Understanding Ray System Components Scheduling and Executing Work on a Node The Head Node Distributed Scheduling and Execution A Simple MapReduce Example with Ray Mapping and Shuffling Document Data Reducing Word Counts Summary 3. Building Your First Distributed Application Introducing Reinforcement Learning Setting Up a Simple Maze Problem Building a Simulation Training a Reinforcement Learning Model Building a Distributed Ray App Recapping RL Terminology Summary 4. Reinforcement Learning with Ray RLlib An Overview of RLlib Getting Started with RLlib Building a Gym Environment Running the RLlib CLI Using the RLlib Python API Training RLlib algorithms Saving, loading, and evaluating RLlib models Computing actions Accessing policy and model states Configuring RLlib Experiments Resource Configuration Rollout Worker Configuration Environment Configuration Working with RLlib Environments An Overview of RLlib Environments Working with Multiple Agents Working with Policy Servers and Clients Defining a server Defining a client Advanced Concepts Building an Advanced Environment Applying Curriculum Learning Working with Offline Data Other Advanced Topics Summary 5. Hyperparameter Optimization with Ray Tune Tuning Hyperparameters Building a Random Search Example with Ray Why Is HPO Hard? An Introduction to Tune How Does Tune Work? Search algorithms Schedulers Configuring and Running Tune Specifying resources Callbacks and metrics Checkpoints, stopping, and resuming Custom and conditional search spaces Machine Learning with Tune Using RLlib with Tune Tuning Keras Models Summary 6. Data Processing with Ray Ray Datasets Ray Datasets Basics Creating a Ray Dataset Reading from and writing to storage Built-in transformations Blocks and repartitioning Schemas and data formats Computing Over Ray Datasets Dataset Pipelines Example: Training Copies of a Classifier in Parallel External Library Integrations Building an ML Pipeline Summary 7. Distributed Training with Ray Train The Basics of Distributed Model Training Introduction to Ray Train by Example Predicting Big Tips in NYC Taxi Rides Loading, Preprocessing, and Featurization Defining a Deep Learning Model Distributed Training with Ray Train Distributed Batch Inference More on Trainers in Ray Train Migrating to Ray Train with Minimal Code Changes Scaling Out Trainers Preprocessing with Ray Train Integrating Trainers with Ray Tune Using Callbacks to Monitor Training Summary 8. Online Inference with Ray Serve Key Characteristics of Online Inference ML Models Are Compute Intensive ML Models Aren’t Useful in Isolation An Introduction to Ray Serve Architectural Overview Defining a Basic HTTP Endpoint Scaling and Resource Allocation Request Batching Multimodel Inference Graphs Core feature: binding multiple deployments Pattern 1: Pipelining Pattern 2: Broadcasting Pattern 3: Conditional logic End-to-End Example: Building an NLP-Powered API Fetching Content and Preprocessing NLP Models HTTP Handling and Driver Logic Putting It All Together Summary 9. Ray Clusters Manually Creating a Ray Cluster Deployment on Kubernetes Setting Up Your First KubeRay Cluster Interacting with the KubeRay Cluster Running Ray programs with kubectl Using the Ray Job Submission server Ray Client Exposing KubeRay Configuring KubeRay Configuring Logging for KubeRay Using the Ray Cluster Launcher Configuring Your Ray Cluster Using the Cluster Launcher CLI Interacting with a Ray Cluster Working with Cloud Clusters AWS Using Other Cloud Providers Autoscaling Summary 10. Getting Started with the Ray AI Runtime Why Use AIR? Key AIR Concepts by Example Ray Datasets and Preprocessors Trainers Tuners and Checkpoints Batch Predictors Deployments Workloads That Are Suited for AIR AIR Workload Execution Stateless execution Stateful execution Composite workload execution Online serving execution AIR Memory Management AIR Failure Model Autoscaling AIR Workloads Summary 11. Ray’s Ecosystem and Beyond A Growing Ecosystem Data Loading and Processing Model Training Model Serving Building Custom Integrations An Overview of Ray’s Integrations Ray and Other Systems Distributed Python Frameworks Ray AIR and the Broader ML Ecosystem How to Integrate AIR into Your ML Platform Where to Go from Here? Summary Index
Donate to keep this site alive
How to download source code?
1. Go to: https://www.oreilly.com/
2. Search the book title: Learning Ray: Flexible Distributed Python for Machine Learning
, sometime you may not get the results, please search the main title
3. Click the book title in the search results
3. Publisher resources
section, click Download Example Code
.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.