Modern Data Science with R, 2nd Edition
- Length: 632 pages
- Edition: 2
- Language: English
- Publisher: Chapman and Hall/CRC
- Publication Date: 2021-04-14
- ISBN-10: 0367191490
- ISBN-13: 9780367191498
- Sales Rank: #167742 (See Top 100 Books)
From a review of the first edition: “Modern Data Science with R… is rich with examples and is guided by a strong narrative voice. What’s more, it presents an organizing framework that makes a convincing argument that data science is a course distinct from applied statistics” (The American Statistician).
Modern Data Science with R is a comprehensive data science textbook for undergraduates that incorporates statistical and computational thinking to solve real-world data problems. Rather than focus exclusively on case studies or programming syntax, this book illustrates how statistical programming in the state-of-the-art R/RStudio computing environment can be leveraged to extract meaningful information from a variety of data in the service of addressing compelling questions.
The second edition is updated to reflect the growing influence of the tidyverse set of packages. All code in the book has been revised and styled to be more readable and easier to understand. New functionality from packages like sf, purrr, tidymodels, and tidytext is now integrated into the text. All chapters have been revised, and several have been split, re-organized, or re-imagined to meet the shifting landscape of best practice.
Cover Half Title Series Page Title Page Copyright Page Contents About the Authors Preface I Part I: Introduction to Data Science 1 Prologue: Why data science? 1.1 What is data science? 1.2 Case study: The evolution of sabermetrics 1.3 Datasets 1.4 Further resources 2 Data visualization 2.1 The 2012 federal election cycle 2.2 Composing data graphics 2.3 Importance of data graphics: Challenger 2.4 Creating effective presentations 2.5 The wider world of data visualization 2.6 Further resources 2.7 Exercises 2.8 Supplementary exercises 3 A grammar for graphics 3.1 A grammar for data graphics 3.2 Canonical data graphics in R 3.3 Extended example: Historical baby names 3.4 Further resources 3.5 Exercises 3.6 Supplementary exercises 4 Data wrangling on one table 4.1 A grammar for data wrangling 4.2 Extended example: Ben's time with the Mets 4.3 Further resources 4.4 Exercises 4.5 Supplementary exercises 5 Data wrangling on multiple tables 5.1 inner _ join () 5.2 left _ join () 5.3 Extended example: Manny Ramirez 5.4 Further resources 5.5 Exercises 5.6 Supplementary exercises 6 Tidy data 6.1 Tidy data 6.2 Reshaping data 6.3 Naming conventions 6.4 Data intake 6.5 Further resources 6.6 Exercises 6.7 Supplementary exercises 7 Iteration 7.1 Vectorized operations 7.2 Using across () with dplyr functions 7.3 The map () family of functions 7.4 Iterating over a one-dimensional vector 7.5 Iteration over subgroups 7.6 Simulation 7.7 Extended example: Factors associated with BMI 7.8 Further resources 7.9 Exercises 7.10 Supplementary exercises 8 Data science ethics 8.1 Introduction 8.2 Truthful falsehoods 8.3 Role of data science in society 8.4 Some settings for professional ethics 8.5 Some principles to guide ethical action 8.6 Algorithmic bias 8.7 Data and disclosure 8.8 Reproducibility 8.9 Ethics, collectively 8.10 Professional guidelines for ethical conduct 8.11 Further resources 8.12 Exercises 8.13 Supplementary exercises II Part II: Statistics and Modeling 9 Statistical foundations 9.1 Samples and populations 9.2 Sample statistics 9.3 The bootstrap 9.4 Outliers 9.5 Statistical models: Explaining variation 9.6 Confounding and accounting for other factors 9.7 The perils of p-values 9.8 Further resources 9.9 Exercises 9.10 Supplementary exercises 10 Predictive modeling 10.1 Predictive modeling 10.2 Simple classification models 10.3 Evaluating models 10.4 Extended example: Who has diabetes? 10.5 Further resources 10.6 Exercises 10.7 Supplementary exercises 11 Supervised learning 11.1 Non-regression classifiers 11.2 Parameter tuning 11.3 Example: Evaluation of income models redux 11.4 Extended example: Who has diabetes this time? 11.5 Regularization 11.6 Further resources 11.7 Exercises 11.8 Supplementary exercises 12 Unsupervised learning 12.1 Clustering 12.2 Dimension reduction 12.3 Further resources 12.4 Exercises 12.5 Supplementary exercises 13 Simulation 13.1 Reasoning in reverse 13.2 Extended example: Grouping cancers 13.3 Randomizing functions 13.4 Simulating variability 13.5 Random networks 13.6 Key principles of simulation 13.7 Further resources 13.8 Exercises 13.9 Supplementary exercises III Part III: Topics in Data Science 14 Dynamic and customized data graphics 14.1 Rich Web content using D3 . js and htmlwidgets 14.2 Animation 14.3 Flexdashboard 14.4 Interactive web apps with Shiny 14.5 Customization of ggplot 2 graphics 14.6 Extended example: Hot dog eating 14.7 Further resources 14.8 Exercises 14.9 Supplementary exercises 15 Database querying using SQL 15.1 From dplyr to SQL 15.2 Flat-file databases 15.3 The SQL universe 15.4 The SQL data manipulation language 15.5 Extended example: FiveThirtyEight flights 15.6 SQL vs. R 15.7 Further resources 15.8 Exercises 15.9 Supplementary exercises 16 Database administration 16.1 Constructing efficient SQL databases 16.2 Changing SQL data 16.3 Extended example: Building a database 16.4 Scalability 16.5 Further resources 16.6 Exercises 16.7 Supplementary exercises 17 Working with geospatial data 17.1 Motivation: What's so great about geospatial data? 17.2 Spatial data structures 17.3 Making maps 17.4 Extended example: Congressional districts 17.5 Effective maps: How (not) to lie 17.6 Projecting polygons 17.7 Playing well with others 17.8 Further resources 17.9 Exercises 17.10 Supplementary exercises 18 Geospatial computations 18.1 Geospatial operations 18.2 Geospatial aggregation 18.3 Geospatial joins 18.4 Extended example: Trail elevations at MacLeish 18.5 Further resources 18.6 Exercises 18.7 Supplementary exercises 19 Text as data 19.1 Regular expressions using Macbeth 19.2 Extended example: Analyzing textual data from arXiv.org 19.3 Ingesting text 19.4 Further resources 19.5 Exercises 19.6 Supplementary exercises 20 Network science 20.1 Introduction to network science 20.2 Extended example: Six degrees of Kristen Stewart 20.3 PageRank 20.4 Extended example: 1996 men's college basketball 20.5 Further resources 20.6 Exercises 20.7 Supplementary exercises 21 Epilogue: Towards “big data” 21.1 Notions of big data 21.2 Tools for bigger data 21.3 Alternatives to R 21.4 Closing thoughts 21.5 Further resources IV Part IV: Appendices A Packages used in this book A.1 The mdsr package A.2 Other packages A.3 Further resources B Introduction to R and RStudio B.1 Installation B.2 Learning R B.3 Fundamental structures and objects B.4 Add-ons: Packages B.5 Further resources B.6 Exercises B.7 Supplementary exercises C Algorithmic thinking C.1 Introduction C.2 Simple example C.3 Extended example: Law of large numbers C.4 Non-standard evaluation C.5 Debugging and defensive coding C.6 Further resources C.7 Exercises C.8 Supplementary exercises D Reproducible analysis and workflow D.1 Scriptable statistical computing D.2 Reproducible analysis with R Markdown D.3 Projects and version control D.4 Further resources D.5 Exercises D.6 Supplementary exercises E Regression modeling E.1 Simple linear regression E.2 Multiple regression E.3 Inference for regression E.4 Assumptions underlying regression E.5 Logistic regression E.6 Further resources E.7 Exercises E.8 Supplementary exercises F Setting up a database server F.1 SQLite F.2 MySQL F.3 PostgreSQL F.4 Connecting to SQL Bibliography Indices Subject index R index
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.