Modern Data Science with R, 2nd Edition

by Benjamin S. Baumer, Daniel T. Kaplan, Nicholas J. Horton

Length: 632 pages
Edition: 2
Language: English
Publisher: Chapman and Hall/CRC
Publication Date: 2021-04-14
ISBN-10: 0367191490
ISBN-13: 9780367191498
Sales Rank: #167742 (See Top 100 Books)

From a review of the first edition: “Modern Data Science with R… is rich with examples and is guided by a strong narrative voice. What’s more, it presents an organizing framework that makes a convincing argument that data science is a course distinct from applied statistics” (The American Statistician).

Modern Data Science with R is a comprehensive data science textbook for undergraduates that incorporates statistical and computational thinking to solve real-world data problems. Rather than focus exclusively on case studies or programming syntax, this book illustrates how statistical programming in the state-of-the-art R/RStudio computing environment can be leveraged to extract meaningful information from a variety of data in the service of addressing compelling questions.

The second edition is updated to reflect the growing influence of the tidyverse set of packages. All code in the book has been revised and styled to be more readable and easier to understand. New functionality from packages like sf, purrr, tidymodels, and tidytext is now integrated into the text. All chapters have been revised, and several have been split, re-organized, or re-imagined to meet the shifting landscape of best practice.

Cover
Half Title
Series Page
Title Page
Copyright Page
Contents
About the Authors
Preface
I Part I: Introduction to Data Science
    1 Prologue: Why data science?
        1.1  What is data science?
        1.2  Case study: The evolution of sabermetrics
        1.3  Datasets
        1.4  Further resources
    2 Data visualization
        2.1  The 2012 federal election cycle
        2.2  Composing data graphics
        2.3  Importance of data graphics: Challenger
        2.4  Creating effective presentations
        2.5  The wider world of data visualization
        2.6  Further resources
        2.7  Exercises
        2.8  Supplementary exercises
    3 A grammar for graphics
        3.1  A grammar for data graphics
        3.2  Canonical data graphics in R
        3.3  Extended example: Historical baby names
        3.4  Further resources
        3.5  Exercises
        3.6  Supplementary exercises
    4 Data wrangling on one table
        4.1  A grammar for data wrangling
        4.2  Extended example: Ben's time with the Mets
        4.3  Further resources
        4.4  Exercises
        4.5  Supplementary exercises
    5 Data wrangling on multiple tables
        5.1  inner _ join ()
        5.2  left _ join ()
        5.3  Extended example: Manny Ramirez
        5.4  Further resources
        5.5  Exercises
        5.6  Supplementary exercises
    6 Tidy data
        6.1  Tidy data
        6.2  Reshaping data
        6.3  Naming conventions
        6.4  Data intake
        6.5  Further resources
        6.6  Exercises
        6.7  Supplementary exercises
    7 Iteration
        7.1  Vectorized operations
        7.2  Using across () with dplyr functions
        7.3  The map () family of functions
        7.4  Iterating over a one-dimensional vector
        7.5  Iteration over subgroups
        7.6  Simulation
        7.7  Extended example: Factors associated with BMI
        7.8  Further resources
        7.9  Exercises
        7.10 Supplementary exercises
    8 Data science ethics
        8.1  Introduction
        8.2  Truthful falsehoods
        8.3  Role of data science in society
        8.4  Some settings for professional ethics
        8.5  Some principles to guide ethical action
        8.6  Algorithmic bias
        8.7  Data and disclosure
        8.8  Reproducibility
        8.9  Ethics, collectively
        8.10 Professional guidelines for ethical conduct
        8.11 Further resources
        8.12 Exercises
        8.13 Supplementary exercises
II Part II: Statistics and Modeling
    9 Statistical foundations
        9.1  Samples and populations
        9.2  Sample statistics
        9.3  The bootstrap
        9.4  Outliers
        9.5  Statistical models: Explaining variation
        9.6  Confounding and accounting for other factors
        9.7  The perils of p-values
        9.8  Further resources
        9.9  Exercises
        9.10 Supplementary exercises
    10 Predictive modeling
        10.1 Predictive modeling
        10.2 Simple classification models
        10.3 Evaluating models
        10.4 Extended example: Who has diabetes?
        10.5 Further resources
        10.6 Exercises
        10.7 Supplementary exercises
    11 Supervised learning
        11.1 Non-regression classifiers
        11.2 Parameter tuning
        11.3 Example: Evaluation of income models redux
        11.4 Extended example: Who has diabetes this time?
        11.5 Regularization
        11.6 Further resources
        11.7 Exercises
        11.8 Supplementary exercises
    12 Unsupervised learning
        12.1 Clustering
        12.2 Dimension reduction
        12.3 Further resources
        12.4 Exercises
        12.5 Supplementary exercises
    13 Simulation
        13.1 Reasoning in reverse
        13.2 Extended example: Grouping cancers
        13.3 Randomizing functions
        13.4 Simulating variability
        13.5 Random networks
        13.6 Key principles of simulation
        13.7 Further resources
        13.8 Exercises
        13.9 Supplementary exercises
III Part III: Topics in Data Science
    14 Dynamic and customized data graphics
        14.1 Rich Web content using D3 . js and htmlwidgets
        14.2 Animation
        14.3 Flexdashboard
        14.4 Interactive web apps with Shiny
        14.5 Customization of ggplot 2 graphics
        14.6 Extended example: Hot dog eating
        14.7 Further resources
        14.8 Exercises
        14.9 Supplementary exercises
    15 Database querying using SQL
        15.1 From dplyr to SQL
        15.2 Flat-file databases
        15.3 The SQL universe
        15.4 The SQL data manipulation language
        15.5 Extended example: FiveThirtyEight flights
        15.6 SQL vs. R
        15.7 Further resources
        15.8 Exercises
        15.9 Supplementary exercises
    16 Database administration
        16.1 Constructing efficient SQL databases
        16.2 Changing SQL data
        16.3 Extended example: Building a database
        16.4 Scalability
        16.5 Further resources
        16.6 Exercises
        16.7 Supplementary exercises
    17 Working with geospatial data
        17.1 Motivation: What's so great about geospatial data?
        17.2 Spatial data structures
        17.3 Making maps
        17.4 Extended example: Congressional districts
        17.5 Effective maps: How (not) to lie
        17.6 Projecting polygons
        17.7 Playing well with others
        17.8 Further resources
        17.9 Exercises
        17.10 Supplementary exercises
    18 Geospatial computations
        18.1 Geospatial operations
        18.2 Geospatial aggregation
        18.3 Geospatial joins
        18.4 Extended example: Trail elevations at MacLeish
        18.5 Further resources
        18.6 Exercises
        18.7 Supplementary exercises
    19 Text as data
        19.1 Regular expressions using Macbeth
        19.2 Extended example: Analyzing textual data from arXiv.org
        19.3 Ingesting text
        19.4 Further resources
        19.5 Exercises
        19.6 Supplementary exercises
    20 Network science
        20.1 Introduction to network science
        20.2 Extended example: Six degrees of Kristen Stewart
        20.3 PageRank
        20.4 Extended example: 1996 men's college basketball
        20.5 Further resources
        20.6 Exercises
        20.7 Supplementary exercises
    21 Epilogue: Towards “big data”
        21.1 Notions of big data
        21.2 Tools for bigger data
        21.3 Alternatives to R
        21.4 Closing thoughts
        21.5 Further resources
IV Part IV: Appendices
    A  Packages used in this book
        A.1  The mdsr package
        A.2  Other packages
        A.3  Further resources
    B  Introduction to R and RStudio
        B.1  Installation
        B.2  Learning R
        B.3  Fundamental structures and objects
        B.4  Add-ons: Packages
        B.5  Further resources
        B.6  Exercises
        B.7  Supplementary exercises
    C  Algorithmic thinking
        C.1  Introduction
        C.2  Simple example
        C.3  Extended example: Law of large numbers
        C.4  Non-standard evaluation
        C.5  Debugging and defensive coding
        C.6  Further resources
        C.7  Exercises
        C.8  Supplementary exercises
    D  Reproducible analysis and workflow
        D.1  Scriptable statistical computing
        D.2  Reproducible analysis with R Markdown
        D.3  Projects and version control
        D.4  Further resources
        D.5  Exercises
        D.6  Supplementary exercises
    E  Regression modeling
        E.1  Simple linear regression
        E.2  Multiple regression
        E.3  Inference for regression
        E.4  Assumptions underlying regression
        E.5  Logistic regression
        E.6  Further resources
        E.7  Exercises
        E.8  Supplementary exercises
    F  Setting up a database server
        F.1  SQLite
        F.2  MySQL
        F.3  PostgreSQL
        F.4  Connecting to SQL
Bibliography
Indices
    Subject index
    R index