Tidy Modeling with R: A Framework for Modeling in the Tidyverse
- Length: 381 pages
- Edition: 1
- Language: English
- Publisher: O'Reilly Media
- Publication Date: 2022-08-23
- ISBN-10: 1492096482
- ISBN-13: 9781492096481
- Sales Rank: #936974 (See Top 100 Books)
Get going with tidymodels, a collection of R packages for modeling and machine learning. Whether you’re just starting out or have years of experience with modeling, this practical introduction shows data analysts, business analysts, and data scientists how the tidymodels framework offers a consistent, flexible approach for your work.
RStudio engineers Max Kuhn and Julia Silge demonstrate ways to create models by focusing on an R dialect called the tidyverse. Software that adopts tidyverse principles shares both a high-level design philosophy and low-level grammar and data structures, so learning one piece of the ecosystem makes it easier to learn the next. You’ll understand why the tidymodels framework has been built to be used by a broad range of people.
With this book, you will:
- Learn the steps necessary to build a model from beginning to end
- Understand how to use different modeling and feature engineering approaches fluently
- Examine the options for avoiding common pitfalls of modeling, such as overfitting
- Learn practical methods to prepare your data for modeling
- Tune models for optimal performance
- Use good statistical practices to compare, evaluate, and choose among models
Preface Conventions Used in This Book Using Code Examples O’Reilly Online Learning How to Contact Us Acknowledgments I. Introduction 1. Software for Modeling Fundamentals for Modeling Software Types of Models Descriptive Models Inferential Models Predictive Models Connections Between Types of Models Some Terminology How Does Modeling Fit into the Data Analysis Process? Chapter Summary 2. A Tidyverse Primer Tidyverse Principles Design for Humans Reuse Existing Data Structures Design for the Pipe and Functional Programming Examples of Tidyverse Syntax Chapter Summary 3. A Review of R Modeling Fundamentals An Example What Does the R Formula Do? Why Tidiness Is Important for Modeling Combining Base R Models and the Tidyverse The tidymodels Metapackage Chapter Summary II. Modeling Basics 4. The Ames Housing Data Exploring Features of Homes in Ames Chapter Summary 5. Spending Our Data Common Methods for Splitting Data What About a Validation Set? Multilevel Data Other Considerations for a Data Budget Chapter Summary 6. Fitting Models with parsnip Create a Model Use the Model Results Make Predictions parsnip-Extension Packages Creating Model Specifications Chapter Summary 7. A Model Workflow Where Does the Model Begin and End? Workflow Basics Adding Raw Variables to the workflow() How Does a workflow() Use the Formula? Tree-Based Models Special Formulas and Inline Functions Creating Multiple Workflows at Once Evaluating the Test Set Chapter Summary 8. Feature Engineering with Recipes A Simple recipe() for the Ames Housing Data Using Recipes How Data Are Used by the recipe() Examples of Steps Encoding Qualitative Data in a Numeric Format Interaction Terms Spline Functions Feature Extraction Row Sampling Steps General Transformations Natural Language Processing Skipping Steps for New Data Tidy a recipe() Column Roles Chapter Summary 9. Judging Model Effectiveness Performance Metrics and Inference Regression Metrics Binary Classification Metrics Multiclass Classification Metrics Chapter Summary III. Tools for Creating Effective Models 10. Resampling for Evaluating Performance The Resubstitution Approach Resampling Methods Cross-Validation Repeated Cross-Validation Leave-One-Out Cross-Validation Monte Carlo Cross-Validation Validation Sets Bootstrapping Rolling Forecasting Origin Resampling Estimating Performance Parallel Processing Saving the Resampled Objects Chapter Summary 11. Comparing Models with Resampling Creating Multiple Models with Workflow Sets Comparing Resampled Performance Statistics Simple Hypothesis Testing Methods Bayesian Methods A Random Intercept Model The Effect of the Amount of Resampling Chapter Summary 12. Model Tuning and the Dangers of Overfitting Model Parameters Tuning Parameters for Different Types of Models What Do We Optimize? The Consequences of Poor Parameter Estimates Two General Strategies for Optimization Tuning Parameters in tidymodels Chapter Summary 13. Grid Search Regular and Nonregular Grids Regular Grids Nonregular Grids Evaluating the Grid Finalizing the Model Tools for Creating Tuning Specifications Tools for Efficient Grid Search Submodel Optimization Parallel Processing Benchmarking Boosted Trees Access to Global Variables Racing Methods Chapter Summary 14. Iterative Search A Support Vector Machine Model Bayesian Optimization A Gaussian Process Model Acquisition Functions The tune_bayes() Function Simulated Annealing Simulated Annealing Search Process The tune_sim_anneal() Function Chapter Summary 15. Screening Many Models Modeling Concrete Mixture Strength Creating the Workflow Set Tuning and Evaluating the Models Efficiently Screening Models Finalizing a Model Chapter Summary IV. Beyond the Basics 16. Dimensionality Reduction What Problems Can Dimensionality Reduction Solve? A Picture Is Worth a Thousand…Beans A Starter Recipe Recipes in the Wild Preparing a Recipe Baking the Recipe Feature Extraction Techniques Principal Component Analysis Partial Least Squares Independent Component Analysis Uniform Manifold Approximation and Projection Modeling Chapter Summary 17. Encoding Categorical Data Is an Encoding Necessary? Encoding Ordinal Predictors Using the Outcome for Encoding Predictors Effect Encodings in tidymodels Effect Encodings with Partial Pooling Feature Hashing More Encoding Options Chapter Summary 18. Explaining Models and Predictions Software for Model Explanations Local Explanations Global Explanations Building Global Explanations from Local Explanations Back to Beans! Chapter Summary 19. When Should You Trust Your Predictions? Equivocal Results Determining Model Applicability Chapter Summary 20. Ensembles of Models Creating the Training Set for Stacking Blend the Predictions Fit the Member Models Test Set Results Chapter Summary 21. Inferential Analysis Inference for Count Data Comparisons with Two-Sample Tests Log-Linear Models A More Complex Model More Inferential Analysis Chapter Summary A. Recommended Preprocessing References Index About the Authors
Donate to keep this site alive
How to download source code?
1. Go to: https://www.oreilly.com/
2. Search the book title: Tidy Modeling with R: A Framework for Modeling in the Tidyverse
, sometime you may not get the results, please search the main title
3. Click the book title in the search results
3. Publisher resources
section, click Download Example Code
.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.