Tidy Modeling with R: A Framework for Modeling in the Tidyverse

Length: 381 pages
Edition: 1
Language: English
Publisher: O'Reilly Media
Publication Date: 2022-08-23
ISBN-10: 1492096482
ISBN-13: 9781492096481
Sales Rank: #936974 (See Top 100 Books)

Get going with tidymodels, a collection of R packages for modeling and machine learning. Whether you’re just starting out or have years of experience with modeling, this practical introduction shows data analysts, business analysts, and data scientists how the tidymodels framework offers a consistent, flexible approach for your work.

RStudio engineers Max Kuhn and Julia Silge demonstrate ways to create models by focusing on an R dialect called the tidyverse. Software that adopts tidyverse principles shares both a high-level design philosophy and low-level grammar and data structures, so learning one piece of the ecosystem makes it easier to learn the next. You’ll understand why the tidymodels framework has been built to be used by a broad range of people.

With this book, you will:

Learn the steps necessary to build a model from beginning to end
Understand how to use different modeling and feature engineering approaches fluently
Examine the options for avoiding common pitfalls of modeling, such as overfitting
Learn practical methods to prepare your data for modeling
Tune models for optimal performance
Use good statistical practices to compare, evaluate, and choose among models

Preface
    Conventions Used in This Book
    Using Code Examples
    O’Reilly Online Learning
    How to Contact Us
    Acknowledgments
I. Introduction
1. Software for Modeling
    Fundamentals for Modeling Software
    Types of Models
        Descriptive Models
        Inferential Models
        Predictive Models
    Connections Between Types of Models
    Some Terminology
    How Does Modeling Fit into the Data Analysis Process?
    Chapter Summary
2. A Tidyverse Primer
    Tidyverse Principles
        Design for Humans
        Reuse Existing Data Structures
        Design for the Pipe and Functional Programming
    Examples of Tidyverse Syntax
    Chapter Summary
3. A Review of R Modeling Fundamentals
    An Example
    What Does the R Formula Do?
    Why Tidiness Is Important for Modeling
    Combining Base R Models and the Tidyverse
    The tidymodels Metapackage
    Chapter Summary
II. Modeling Basics
4. The Ames Housing Data
    Exploring Features of Homes in Ames
    Chapter Summary
5. Spending Our Data
    Common Methods for Splitting Data
    What About a Validation Set?
    Multilevel Data
    Other Considerations for a Data Budget
    Chapter Summary
6. Fitting Models with parsnip
    Create a Model
    Use the Model Results
    Make Predictions
    parsnip-Extension Packages
    Creating Model Specifications
    Chapter Summary
7. A Model Workflow
    Where Does the Model Begin and End?
    Workflow Basics
    Adding Raw Variables to the workflow()
    How Does a workflow() Use the Formula?
        Tree-Based Models
        Special Formulas and Inline Functions
    Creating Multiple Workflows at Once
    Evaluating the Test Set
    Chapter Summary
8. Feature Engineering with Recipes
    A Simple recipe() for the Ames Housing Data
    Using Recipes
    How Data Are Used by the recipe()
    Examples of Steps
        Encoding Qualitative Data in a Numeric Format
        Interaction Terms
        Spline Functions
        Feature Extraction
        Row Sampling Steps
        General Transformations
        Natural Language Processing
    Skipping Steps for New Data
    Tidy a recipe()
    Column Roles
    Chapter Summary
9. Judging Model Effectiveness
    Performance Metrics and Inference
    Regression Metrics
    Binary Classification Metrics
    Multiclass Classification Metrics
    Chapter Summary
III. Tools for Creating Effective Models
10. Resampling for Evaluating Performance
    The Resubstitution Approach
    Resampling Methods
        Cross-Validation
        Repeated Cross-Validation
        Leave-One-Out Cross-Validation
        Monte Carlo Cross-Validation
        Validation Sets
        Bootstrapping
        Rolling Forecasting Origin Resampling
    Estimating Performance
    Parallel Processing
    Saving the Resampled Objects
    Chapter Summary
11. Comparing Models with Resampling
    Creating Multiple Models with Workflow Sets
    Comparing Resampled Performance Statistics
    Simple Hypothesis Testing Methods
    Bayesian Methods
        A Random Intercept Model
        The Effect of the Amount of Resampling
    Chapter Summary
12. Model Tuning and the Dangers  of Overfitting
    Model Parameters
    Tuning Parameters for Different Types of Models
    What Do We Optimize?
    The Consequences of Poor Parameter Estimates
    Two General Strategies for Optimization
    Tuning Parameters in tidymodels
    Chapter Summary
13. Grid Search
    Regular and Nonregular Grids
        Regular Grids
        Nonregular Grids
    Evaluating the Grid
    Finalizing the Model
    Tools for Creating Tuning Specifications
    Tools for Efficient Grid Search
        Submodel Optimization
        Parallel Processing
        Benchmarking Boosted Trees
        Access to Global Variables
        Racing Methods
    Chapter Summary
14. Iterative Search
    A Support Vector Machine Model
    Bayesian Optimization
        A Gaussian Process Model
        Acquisition Functions
        The tune_bayes() Function
    Simulated Annealing
        Simulated Annealing Search Process
        The tune_sim_anneal() Function
    Chapter Summary
15. Screening Many Models
    Modeling Concrete Mixture Strength
    Creating the Workflow Set
    Tuning and Evaluating the Models
    Efficiently Screening Models
    Finalizing a Model
    Chapter Summary
IV. Beyond the Basics
16. Dimensionality Reduction
    What Problems Can Dimensionality Reduction Solve?
    A Picture Is Worth a Thousand…Beans
    A Starter Recipe
    Recipes in the Wild
        Preparing a Recipe
        Baking the Recipe
    Feature Extraction Techniques
        Principal Component Analysis
        Partial Least Squares
        Independent Component Analysis
        Uniform Manifold Approximation and Projection
    Modeling
    Chapter Summary
17. Encoding Categorical Data
    Is an Encoding Necessary?
    Encoding Ordinal Predictors
    Using the Outcome for Encoding Predictors
        Effect Encodings in tidymodels
        Effect Encodings with Partial Pooling
    Feature Hashing
    More Encoding Options
    Chapter Summary
18. Explaining Models and Predictions
    Software for Model Explanations
    Local Explanations
    Global Explanations
    Building Global Explanations from Local Explanations
    Back to Beans!
    Chapter Summary
19. When Should You Trust Your Predictions?
    Equivocal Results
    Determining Model Applicability
    Chapter Summary
20. Ensembles of Models
    Creating the Training Set for Stacking
    Blend the Predictions
    Fit the Member Models
    Test Set Results
    Chapter Summary
21. Inferential Analysis
    Inference for Count Data
    Comparisons with Two-Sample Tests
    Log-Linear Models
    A More Complex Model
    More Inferential Analysis
    Chapter Summary
A. Recommended Preprocessing
References
Index
About the Authors

Applied Data Processing Mathematical & Statistical Mathematics Probability & Statistics Statistics

Donate to keep this site alive

To access the Link, solve the captcha.

How to download source code?

1. Go to: https://www.oreilly.com/

2. Search the book title: Tidy Modeling with R: A Framework for Modeling in the Tidyverse, sometime you may not get the results, please search the main title

3. Click the book title in the search results

3. Publisher resources section, click Download Example Code.

1. Disable the AdBlock plugin. Otherwise, you may not get any links.

2. Solve the CAPTCHA.

3. Click download link.

4. Lead to download server to download.