Effective Data Science Infrastructure: How to make data scientists productive
- Length: 325 pages
- Edition: 1
- Language: English
- Publisher: Manning
- Publication Date: 2022-06-28
- ISBN-10: 1617299197
- ISBN-13: 9781617299193
- Sales Rank: #2912268 (See Top 100 Books)
Simplify data science infrastructure to give data scientists an efficient path from prototype to production.
Effective Data Science Infrastructure is a hands-on guide to assembling infrastructure for data science and machine learning applications. It reveals the processes used at Netflix and other data-driven companies to manage their cutting edge data infrastructure.
As you work through this easy-to-follow guide, you’ll set up end-to-end infrastructure from the ground up, with a fully customizable process you can easily adapt to your company. You’ll learn how you can make data scientists more productive with your existing cloud infrastructure, a stack of open source software, and idiomatic Python. Throughout, you’ll follow a human-centric approach focused on user experience and meeting the unique needs of data scientists.
Effective Data Science Infrastructure MEAP V07 Copyright welcome Brief contents Chapter 1: Introduction 1.1 Why Data Science Infrastructure 1.1.1 Lifecycle of a Data Science Project 1.2 What is Data Science Infrastructure 1.2.1 The Infrastructure Stack for Data Science 1.2.2 Taming Complexity 1.2.3 Leveraging Existing Platforms 1.3 Human-Centric Infrastructure 1.3.1 Data scientist autonomy 1.4 Summary Chapter 2: The Toolchain of Data Science 2.1 Setting up a Development Environment 2.1.1 Cloud Account 2.1.2 Data Science Workstation 2.1.3 Notebooks 2.1.4 Putting things together 2.2 Introducing Workflows 2.2.1 The basics of workflows 2.2.2 Executing workflows 2.2.3 The world of workflow frameworks 2.3 Summary Chapter 3: Introducing Metaflow 3.1 Basics of Metaflow 3.1.1 Writing a basic workflow 3.1.2 Managing data flow in workflows 3.1.3 Parameters 3.2 Branching and merging 3.2.1 Valid DAG structures 3.2.2 Static branches 3.2.3 Dynamic branches 3.2.4 Controlling Concurrency 3.3 Metaflow in Action 3.3.1 Starting a new project 3.3.2 Accessing results with the Client API 3.3.3 Debugging failures 3.3.4 Finishing touches 3.4 Summary Chapter 4: Scaling with The Compute Layer 4.1 What is Scalability 4.1.1 Scaling organizations 4.2 The Compute Layer 4.2.1 Batch processing with containers 4.2.2 Examples of compute layers 4.3 The compute layer in Metaflow 4.3.1 Configuring AWS Batch for Metaflow 4.3.2 @batch and @resources decorators 4.4 Handling failures 4.4.1 Recovering from transient errors with @retry 4.4.2 Killing zombies with @timeout 4.4.3 The decorator of the last resort: @catch 4.5 Summary Chapter 5: Practicing Scalability and Performance 5.1 Starting simple: Vertical scalability 5.1.1 Example: Clustering Yelp Reviews 5.1.2 Practicing vertical scalability 5.1.3 Why vertical scalability 5.2 Practicing Horizontal Scalability 5.2.1 Why horizontal scalability 5.2.2 Example: Hyperparameter search 5.3 Practicing performance optimization 5.3.1 Example: Computing a co-occurrence matrix 5.3.2 Recipe for fast enough workflows 5.4 Summary Chapter 6: Going to Production 6.1 Stable workflow scheduling 6.1.1 Centralized metadata 6.1.2 Using AWS Step Functions with Metaflow 6.1.3 Scheduling runs with @schedule 6.2 Stable execution environments 6.2.1 How Metaflow packages flows 6.2.2 Why dependency managements matters 6.2.3 Using the @conda decorator 6.3 Stable operations 6.3.1 Namespaces during prototyping 6.3.2 Production namespaces 6.3.3 Parallel deployments with @project 6.4 Summary Chapter 7: Processing Data 7.1 Foundations of Fast Data 7.1.1 Loading data from S3 7.1.2 Working with tabular data 7.1.3 The in-memory data stack 7.2 Interfacing with Data Infrastructure 7.2.1 Modern data infrastructure 7.2.2 Preparing datasets in SQL 7.2.3 Distributed data processing 7.3 From Data to Features 7.3.1 Encoding features 7.4 Summary Chapter 8: Using and Operating Models 8.1 Producing Predictions Batch, streaming, and real-time predictions 8.1.1 Example: Recommendation system Training a rudimentary recommendations model 8.1.2 Batch predictions Producing recommendations Sharing results robustly Producing a batch of recommendations Using recommendations in a web app 8.1.3 Real-time predictions Example: Real-time movie recommendations 8.2 Summary Chapter 9: Machine Learning With the Full Stack 9.1 Pluggable Feature Encoders and Models 9.1.1 Pluggable models and feature encoders Defining a feature encoder Loading and executing plugins 9.1.2 Benchmarking models Model workflow 9.2 Deep Regression Model 9.2.1 Encoding input tensors Data loader 9.2.2 Defining a deep regression model 9.2.3 Training a deep regression model Small-scale training Large-scale training 9.3 Summarizing Lessons Learned 9.4 Summary
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.