R in Action, Third Edition: Data analysis and graphics with R and Tidyverse, 3rd Edition
R is the most powerful tool you can use for statistical analysis. This definitive guide smooths R’s steep learning curve with practical solutions and real-world applications for commercial environments.
In R in Action, Third Edition you will learn how to:
- Set up and install R and RStudio
- Clean, manage, and analyze data with R
- Use the ggplot2 package for graphs and visualizations
- Solve data management problems using R functions
- Fit and interpret regression models
- Test hypotheses and estimate confidence
- Simplify complex multivariate data with principal components and exploratory factor analysis
- Make predictions using time series forecasting
- Create dynamic reports and stunning visualizations
- Techniques for debugging programs and creating packages
R in Action, Third Edition makes learning R quick and easy. That’s why thousands of data scientists have chosen this guide to help them master the powerful language. Far from being a dry academic tome, every example you’ll encounter in this book is relevant to scientific and business developers, and helps you solve common data challenges. R expert Rob Kabacoff takes you on a crash course in statistics, from dealing with messy and incomplete data to creating stunning visualizations. This revised and expanded third edition contains fresh coverage of the new tidyverse approach to data analysis and R’s state-of-the-art graphing capabilities with the ggplot2 package.
Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.
About the technology
Used daily by data scientists, researchers, and quants of all types, R is the gold standard for statistical data analysis. This free and open source language includes packages for everything from advanced data visualization to deep learning. Instantly comfortable for mathematically minded users, R easily handles practical problems without forcing you to think like a software engineer.
About the book
R in Action, Third Edition teaches you how to do statistical analysis and data visualization using R and its popular tidyverse packages. In it, you’ll investigate real-world data challenges, including forecasting, data mining, and dynamic report writing. This revised third edition adds new coverage for graphing with ggplot2, along with examples for machine learning topics like clustering, classification, and time series analysis.
- Clean, manage, and analyze data
- Use the ggplot2 package for graphs and visualizations
- Techniques for debugging programs and creating packages
- A complete learning resource for R and tidyverse
About the reader
Requires basic math and statistics. No prior experience with R needed.
About the author
Dr. Robert I Kabacoff is a professor of quantitative analytics at Wesleyan University and a seasoned data scientist with more than 20 years of experience.
R in Action Copyright Praise for the previous edition of R in Action brief contents contents Front matter preface acknowledgments about this book What's new in the third edition Who should read this book How this book is organized: A road map Advice for data miners About the code liveBook discussion forum about the author about the cover illustration Part 1. Getting started 1 Introduction to R 1.1 Why use R? 1.2 Obtaining and installing R 1.3 Working with R 1.3.1 Getting started 1.3.2 Using RStudio 1.3.3 Getting help 1.3.4 The workspace 1.3.5 Projects 1.4 Packages 1.4.1 What are packages? 1.4.2 Installing a package 1.4.3 Loading a package 1.4.4 Learning about a package 1.5 Using output as input: Reusing results 1.6 Working with large datasets 1.7 Working through an example Summary 2 Creating a dataset 2.1 Understanding datasets 2.2 Data structures 2.2.1 Vectors 2.2.2 Matrices 2.2.3 Arrays 2.2.4 Data frames 2.2.5 Factors 2.2.6 Lists 2.2.7 Tibbles 2.3 Data input 2.3.1 Entering data from the keyboard 2.3.2 Importing data from a delimited text file 2.3.3 Importing data from Excel 2.3.4 Importing data from JSON 2.3.5 Importing data from the web 2.3.6 Importing data from SPSS 2.3.7 Importing data from SAS 2.3.8 Importing data from Stata 2.3.9 Accessing database management systems 2.3.10 Importing data via Stat/Transfer 2.4 Annotating datasets 2.4.1 Variable labels 2.4.2 Value labels 2.5 Useful functions for working with data objects Summary 3 Basic data management 3.1 A working example 3.2 Creating new variables 3.3 Recoding variables 3.4 Renaming variables 3.5 Missing values 3.5.1 Recoding values to missing 3.5.2 Excluding missing values from analyses 3.6 Date values 3.6.1 Converting dates to character variables 3.6.2 Going further 3.7 Type conversions 3.8 Sorting data 3.9 Merging datasets 3.9.1 Adding columns to a data frame 3.9.2 Adding rows to a data frame 3.10 Subsetting datasets 3.10.1 Selecting variables 3.10.2 Dropping variables 3.10.3 Selecting observations 3.10.4 The subset() function 3.10.5 Random samples 3.11 Using dplyr to manipulate data frames 3.11.1 Basic dplyr functions 3.11.2 Using pipe operators to chain statements 3.12 Using SQL statements to manipulate data frames Summary 4 Getting started with graphs 4.1 Creating a graph with ggplot2 4.1.1 ggplot 4.1.2 Geoms 4.1.3 Grouping 4.1.4 Scales 4.1.5 Facets 4.1.6 Labels 4.1.7 Themes 4.2 ggplot2 details 4.2.1 Placing the data and mapping options 4.2.2 Graphs as objects 4.2.3 Saving graphs 4.2.4 Common mistakes Summary 5 Advanced data management 5.1 A data management challenge 5.2 Numerical and character functions 5.2.1 Mathematical functions 5.2.2 Statistical functions 5.2.3 Probability functions 5.2.4 Character functions 5.2.5 Other useful functions 5.2.6 Applying functions to matrices and data frames 5.2.7 A solution for the data management challenge 5.3 Control flow 5.3.1 Repetition and looping 5.3.2 Conditional execution 5.4 User-written functions 5.5 Reshaping data 5.5.1 Transposing 5.5.2 Converting from wide to long dataset formats 5.6 Aggregating data Summary Part 2. Basic methods 6 Basic graphs 6.1 Bar charts 6.1.1 Simple bar charts 6.1.2 Stacked, grouped, and filled bar charts 6.1.3 Mean bar charts 6.1.4 Tweaking bar charts 6.2 Pie charts 6.3 Tree maps 6.4 Histograms 6.5 Kernel density plots 6.6 Box plots 6.6.1 Using parallel box plots to compare groups 6.6.2 Violin plots 6.7 Dot plots Summary 7 Basic statistics 7.1 Descriptive statistics 7.1.1 A menagerie of methods 7.1.2 Even more methods 7.1.3 Descriptive statistics by group 7.1.4 Summarizing data interactively with dplyr 7.1.5 Visualizing results 7.2 Frequency and contingency tables 7.2.1 Generating frequency tables 7.2.2 Tests of independence 7.2.3 Measures of association 7.2.4 Visualizing results 7.3 Correlations 7.3.1 Types of correlations 7.3.2 Testing correlations for significance 7.3.3 Visualizing correlations 7.4 T-tests 7.4.1 Independent t-test 7.4.2 Dependent t-test 7.4.3 When there are more than two groups 7.5 Nonparametric tests of group differences 7.5.1 Comparing two groups 7.5.2 Comparing more than two groups 7.6 Visualizing group differences Summary Part 3. Intermediate methods 8 Regression 8.1 The many faces of regression 8.1.1 Scenarios for using OLS regression 8.1.2 What you need to know 8.2 OLS regression 8.2.1 Fitting regression models with lm() 8.2.2 Simple linear regression 8.2.3 Polynomial regression 8.2.4 Multiple linear regression 8.2.5 Multiple linear regression with interactions 8.3 Regression diagnostics 8.3.1 A typical approach 8.3.2 An enhanced approach 8.3.3 Multicollinearity 8.4 Unusual observations 8.4.1 Outliers 8.4.2 High-leverage points 8.4.3 Influential observations 8.5 Corrective measures 8.5.1 Deleting observations 8.5.2 Transforming variables 8.5.3 Adding or deleting variables 8.5.4 Trying a different approach 8.6 Selecting the “best” regression model 8.6.1 Comparing models 8.6.2 Variable selection 8.7 Taking the analysis further 8.7.1 Cross-validation 8.7.2 Relative importance Summary 9 Analysis of variance 9.1 A crash course on terminology 9.2 Fitting ANOVA models 9.2.1 The aov() function 9.2.2 The order of formula terms 9.3 One-way ANOVA 9.3.1 Multiple comparisons 9.3.2 Assessing test assumptions 9.4 One-way ANCOVA 9.4.1 Assessing test assumptions 9.4.2 Visualizing the results 9.5 Two-way factorial ANOVA 9.6 Repeated measures ANOVA 9.7 Multivariate analysis of variance (MANOVA) 9.7.1 Assessing test assumptions 9.7.2 Robust MANOVA 9.8 ANOVA as regression Summary 10 Power analysis 10.1 A quick review of hypothesis testing 10.2 Implementing power analysis with the pwr package 10.2.1 T-tests 10.2.2 ANOVA 10.2.3 Correlations 10.2.4 Linear models 10.2.5 Tests of proportions 10.2.6 Chi-square tests 10.2.7 Choosing an appropriate effect size in novel situations 10.3 Creating power analysis plots 10.4 Other packages Summary 11 Intermediate graphs 11.1 Scatter plots 11.1.1 Scatter plot matrices 11.1.2 High-density scatter plots 11.1.3 3D scatter plots 11.1.4 Spinning 3D scatter plots 11.1.5 Bubble plots 11.2 Line charts 11.3 Corrgrams 11.4 Mosaic plots Summary 12 Resampling statistics and bootstrapping 12.1 Permutation tests 12.2 Permutation tests with the coin package 12.2.1 Independent two-sample and k-sample tests 12.2.2 Independence in contingency tables 12.2.3 Independence between numeric variables 12.2.4 Dependent two-sample and k-sample tests 12.2.5 Going further 12.3 Permutation tests with the lmPerm package 12.3.1 Simple and polynomial regression 12.3.2 Multiple regression 12.3.3 One-way ANOVA and ANCOVA 12.3.4 Two-way ANOVA 12.4 Additional comments on permutation tests 12.5 Bootstrapping 12.6 Bootstrapping with the boot package 12.6.1 Bootstrapping a single statistic 12.6.2 Bootstrapping several statistics Summary Part 4. Advanced methods 13 Generalized linear models 13.1 Generalized linear models and the glm() function 13.1.1 The glm() function 13.1.2 Supporting functions 13.1.3 Model fit and regression diagnostics 13.2 Logistic regression 13.2.1 Interpreting the model parameters 13.2.2 Assessing the impact of predictors on the probability of an outcome 13.2.3 Overdispersion 13.2.4 Extensions 13.3 Poisson regression 13.3.1 Interpreting the model parameters 13.3.2 Overdispersion 13.3.3 Extensions Summary 14 Principal components and factor analysis 14.1 Principal components and factor analysis in R 14.2 Principal components 14.2.1 Selecting the number of components to extract 14.2.2 Extracting principal components 14.2.3 Rotating principal components 14.2.4 Obtaining principal component scores 14.3 Exploratory factor analysis 14.3.1 Deciding how many common factors to extract 14.3.2 Extracting common factors 14.3.3 Rotating factors 14.3.4 Factor scores 14.3.5 Other EFA-related packages 14.4 Other latent variable models Summary 15 Time series 15.1 Creating a time-series object in R 15.2 Smoothing and seasonal decomposition 15.2.1 Smoothing with simple moving averages 15.2.2 Seasonal decomposition 15.3 Exponential forecasting models 15.3.1 Simple exponential smoothing 15.3.2 Holt and Holt–Winters exponential smoothing 15.3.3 The ets() function and automated forecasting 15.4 ARIMA forecasting models 15.4.1 Prerequisite concepts 15.4.2 ARMA and ARIMA models 15.4.3 Automated ARIMA forecasting 15.5 Going further Summary 16 Cluster analysis 16.1 Common steps in cluster analysis 16.2 Calculating distances 16.3 Hierarchical cluster analysis 16.4 Partitioning-cluster analysis 16.4.1 K-means clustering 16.4.2 Partitioning around medoids 16.5 Avoiding nonexistent clusters 16.6 Going further Summary 17 Classification 17.1 Preparing the data 17.2 Logistic regression 17.3 Decision trees 17.3.1 Classical decision trees 17.3.2 Conditional inference trees 17.4 Random forests 17.5 Support vector machines 17.5.1 Tuning an SVM 17.6 Choosing a best predictive solution 17.7 Understanding black box predictions 17.7.1 Break-down plots 17.7.2 Plotting Shapley values 17.8 Going further Summary 18 Advanced methods for missing data 18.1 Steps in dealing with missing data 18.2 Identifying missing values 18.3 Exploring missing-values patterns 18.3.1 Visualizing missing values 18.3.2 Using correlations to explore missing values 18.4 Understanding the sources and impact of missing data 18.5 Rational approaches for dealing with incomplete data 18.6 Deleting missing data 18.6.1 Complete-case analysis (listwise deletion) 18.6.2 Available case analysis (pairwise deletion) 18.7 Single imputation 18.7.1 Simple imputation 18.7.2 K-nearest neighbor imputation 18.7.3 missForest 18.8 Multiple imputation 18.9 Other approaches to missing data Summary Part 5. Expanding your skills 19 Advanced graphs 19.1 Modifying scales 19.1.1 Customizing axes 19.1.2 Customizing colors 19.2 Modifying themes 19.2.1 Prepackaged themes 19.2.2 Customizing fonts 19.2.3 Customizing legends 19.2.4 Customizing the plot area 19.3 Adding annotations 19.4 Combining graphs 19.5 Making graphs interactive Summary 20 Advanced programming 20.1 A review of the language 20.1.1 Data types 20.1.2 Control structures 20.1.3 Creating functions 20.2 Working with environments 20.3 Non-standard evaluation 20.4 Object-oriented programming 20.4.1 Generic functions 20.4.2 Limitations of the S3 model 20.5 Writing efficient code 20.5.1 Efficient data input 20.5.2 Vectorization 20.5.3 Correctly sizing objects 20.5.4 Parallelization 20.6 Debugging 20.6.1 Common sources of errors 20.6.2 Debugging tools 20.6.3 Session options that support debugging 20.6.4 Using RStudio’s visual debugger 20.7 Going further Summary 21 Creating dynamic reports 21.1 A template approach to reports 21.2 Creating a report with R and R Markdown 21.3 Creating a report with R and LaTeX 21.3.1 Creating a parameterized report 21.4 Avoiding common R Markdown problems 21.5 Going further Summary 22 Creating a package 22.1 The edatools package 22.2 Creating a package 22.2.1 Installing development tools 22.2.2 Creating a package project 22.2.3 Writing the package functions 22.2.4 Adding function documentation 22.2.5 Adding a general help file (optional) 22.2.6 Adding sample data to the package (optional) 22.2.7 Adding a vignette (optional) 22.2.8 Editing the DESCRIPTION file 22.2.9 Building and installing the package 22.3 Sharing your package 22.3.1 Distributing a source package file 22.3.2 Submitting to CRAN 22.3.3 Hosting on GitHub 22.3.4 Creating a package website 22.4 Going further Summary Afterword. Into the rabbit hole Appendix A. Graphical user interfaces Appendix B. Customizing the startup environment Appendix C. Exporting data from R C.1 Delimited text file C.2 Excel spreadsheet C.3 Statistical applications Appendix D. Matrix algebra in R Appendix E. Packages used in this book Appendix F. Working with large datasets F.1 Efficient programming F.2 Storing data outside of RAM F.3 Analytic packages for out-of-memory data F.4 Comprehensive solutions for working with enormous datasets Appendix G. Updating an R installation G.1 Automated installation (Windows only) G.2 Manual installation (Windows and macOS) G.3 Updating an R installation (Linux) References index
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.