Automated Machine Learning for Business
- Length: 352 pages
- Edition: 1
- Language: English
- Publisher: Oxford University Press
- Publication Date: 2021-06-10
- ISBN-10: 0190941669
- ISBN-13: 9780190941666
- Sales Rank: #2735037 (See Top 100 Books)
Teaches the machine learning process for business students and professionals using automated machine learning, a new development in data science that requires only a few weeks to learn instead of years of training
Though the concept of computers learning to solve a problem may still conjure thoughts of futuristic artificial intelligence, the reality is that machine learning algorithms now exist within most major software, including Websites and even word processors. These algorithms are transforming society
in the most radical way since the Industrial Revolution, primarily through automating tasks such as deciding which users to advertise to, which machines are likely to break down, and which stock to buy and sell. While this work no longer always requires advanced technical expertise, it is crucial
that practitioners and students alike understand the world of machine learning.
In this book, Kai R. Larsen and Daniel S. Becker teach the machine learning process using a new development in data science: automated machine learning (AutoML). AutoML, when implemented properly, makes machine learning accessible by removing the need for years of experience in the most arcane
aspects of data science, such as math, statistics, and computer science. Larsen and Becker demonstrate how anyone trained in the use of AutoML can use it to test their ideas and support the quality of those ideas during presentations to management and stakeholder groups. Because the requisite
investment is a few weeks rather than a few years of training, these tools will likely become a core component of undergraduate and graduate programs alike.
With first-hand examples from the industry-leading DataRobot platform, Automated Machine Learning for Business provides a clear overview of the process and engages with essential tools for the future of data science.
Cover Title Page Copyright Page Contents Preface Automated Machine Learning (AutoML) A Note to Instructors Acknowledgments Book Outline Dataset Download Copyrights Section I: Why Use Automated Machine Learning? 1. What Is Machine Learning? 1.1 Why Learn This? 1.2 Machine Learning Is Everywhere 1.3 What Is Machine Learning? 1.4 Data for Machine Learning 1.5 Exercises 2. Automating Machine Learning 2.1 What Is Automated Machine Learning? 2.2 What Automated Machine Learning Is Not 2.3 Available Tools and Platforms 2.4 Eight Criteria for AutoML Excellence 2.5 How Do the Fundamental Principles of Machine Learning and Artificial Intelligence Transfer to AutoML? A Point-by-Point Evaluation. 2.6 Exercises Section II: Defining Project Objectives 3. Specify Business Problem 3.1 Why Start with a Business Problem? 3.2 Problem Statements 3.3 Exercises 4. Acquire Subject Matter Expertise 4.1 Importance of Subject Matter Expertise 4.2 Exercises 5. Define Prediction Target 5.1 What Is a Prediction Target? 5.2 How Is the Target Important for Machine Learning? 5.3 Exercises/Discussion 6. Decide on Unit of Analysis 6.1 What Is a Unit of Analysis? 6.2 How to Determine Unit of Analysis 6.3 Exercises 7. Success, Risk, and Continuation 7.1 Identify Success Criteria 7.2 Foresee Risks 7.3 Decide Whether to Continue 7.4 Exercises Section III: Acquire and Integrate Data 8. Accessing and Storing Data 8.1 Track Down Relevant Data 8.2 Examine Data and Remove Columns 8.3 Example Dataset 8.4 Exercises 9. Data Integration 9.1 Joins 9.2 Exercises 10. Data Transformations 10.1 Splitting and Extracting New Columns 10.1.1 IF-THEN Statements and One-hot Encoding 10.1.2 Regular Expressions (RegEx) 10.2 Transformations 10.3 Exercises 11. Summarization 11.1 Summarize 11.2 Crosstab 11.3 Exercises 12. Data Reduction and Splitting 12.1 Unique Rows 12.2 Filtering 12.3 Combining the data 12.4 Exercises Section IV: Model Data 13. Startup Processes 13.1 Uploading Data 13.2 Exercise 14. Feature Understanding and Selection 14.1 Descriptive Statistics 14.2 Data Types 14.3 Evaluations of Feature Content 14.4 Missing Values 14.5 Exercises 15. Build Candidate Models 15.1 Starting the Process 15.2 Advanced Options 15.3 Starting the Analytical Process 15.4 Model Selection Process 15.4.1 Tournament Round 1: 32% Sample 15.4.2 Tournament Round 2: 64% Sample 15.4.3 Tournament Round 3: Cross Validation 15.4.4 Tournament Round 4: Blending 15.5 Exercises 16. Understanding the Process 16.1 Learning Curves and Speed 16.2 Accuracy Tradeoffs 16.3 Blueprints 16.3.1 Numeric Data Cleansing (Imputation) 16.3.2 Standardization 16.3.3 One-hot Encoding 16.3.4 Ordinal Encoding 16.3.5 Matrix of Word-gram Occurrences 16.3.6 Classification 16.4 Hyperparameter Optimization (Advanced Content) 16.5 Exercises 17. Evaluate Model Performance 17.1 Introduction 17.2 A Sample Algorithm and Model 17.3 ROC Curve 17.4 Using the Lift Chart and Profit Curve for Business Decisions 17.5 Exercises 18. Comparing Model Pairs 18.1 Model Comparison 18.2 Prioritizing Modeling Criteria and Selecting a Model 18.3 Exercises Section V: Interpret and Communicate 19. Interpret Model 19.1 Feature Impacts on Target 19.2 The Overall Impact of Features on the Target without Consideration of Other Features 19.3 The Overall Impact of a Feature Adjusted for the Impact of Other Features 19.4 The Directional Impact of Features on Target 19.5 The Partial Impact of Features on Target 19.6 The Power of Language 19.7 Hotspots 19.8 Prediction Explanations 19.9 Exercises 20. Communicate Model Insights 20.1 Unlocking Holdout 20.2 Business Problem First 20.3 Pre-processing and Model Quality Metrics 20.4 Areas Where the Model Struggles 20.5 Most Predictive Features 20.6 Not All Features Are Created Equal 20.7 Recommended Business Actions 20.8 Exercises Section VI: Implement, Document and Maintain 21. Set Up Prediction System 21.1 Retraining Model 21.2 Choose Deployment Strategy 21.3 Exercises 22. Document Modeling Process for Reproducibility 22.1 Model Documentation 22.2 Exercises 23. Create Model Monitoring and Maintenance Plan 23.1 Potential Problems 23.2 Strategies 23.3 Exercises 24. Seven Types of Target Leakage in Machine Learning and an Exercise 24.1 Types of Target Leakage 24.2 A Hands-on Exercise in Detecting Target Leakage 24.3 Exercises 25. Time-Aware Modeling 25.1 An Example of Time-Aware Modeling 25.1.1 Problem Statement 25.1.2 Data 25.1.3 Initialize Analysis 25.1.4 Time-Aware Modeling Background 25.1.5 Data Preparation 25.1.6 Model Building and Residuals 25.1.7 Candidate Models 25.1.8 Selecting and Examining a Model 25.1.9 A Small Detour into Residuals 25.1.10 Model Value 25.1.11 Learning about Avocado Price Drivers 25.2 Exercises 26. Time-Series Modeling 26.1 The Assumptions of Time-Series Machine Learning 26.2 A Hands-on Exercise in Time-Series Analysis 26.2.1 Problem Context 26.2.2 Loading Data 26.2.3 Specify Time Unit and Generate Features 26.2.3 Examine Candidate Models 26.2.4 Digging into the Preferred Model 26.2.5 Predicting 26.3 Exercises Appendix A. Datasets A.1 Diabetes Patients Readmissions Summary Business Goal Datasets Exercises Rights A.2 Luxury Shoes Summary Business Goal Datasets Exercises A.3 Boston Airbnb Summary Business Goal Datasets Rights A.4 Part Backorders Summary Business Goal Datasets Exercises Rights A.5 Student Grades Portuguese Summary Business Goal Datasets Exercises Rights A.6 Lending Club Summary Business Goal Dataset Rights A.7 College Starting Salaries Summary Business Goal Datasets Exercises Rights A.8 HR Attrition Summary Business Goal Datasets Exercises Rights A.9 Avocadopocalypse Now? Summary Business Goal Datasets Exercises Rights Appendix B. Optimization and Sorting Measures Appendix C. More on Cross Validation References Index
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.