The R Book, 3rd Edition

by Elinor Jones, Michael J. Crawley, Simon Harden

Length: 880 pages
Edition: 3
Language: English
Publisher: Wiley
Publication Date: 2022-11-21
ISBN-10: 1119634326
ISBN-13: 9781119634324
Sales Rank: #213821 (See Top 100 Books)

A start-to-finish guide to one of the most useful programming languages for researchers in a variety of fields

In the newly revised Third Edition of The R Book, a team of distinguished teachers and researchers delivers a user-friendly and comprehensive discussion of foundational and advanced topics in the R software language, which is used widely in science, engineering, medicine, economics, and other fields. The book is designed to be used as both a complete text—readable from cover to cover—and as a reference manual for practitioners seeking authoritative guidance on particular topics.

This latest edition offers instruction on the use of the RStudio GUI, an easy-to-use environment for those new to R. It provides readers with a complete walkthrough of the R language, beginning at a point that assumes no prior knowledge of R and very little previous knowledge of statistics. Readers will also find:

A thorough introduction to fundamental concepts in statistics and step-by-step roadmaps to their implementation in R;
Comprehensive explorations of worked examples in R;
A complementary companion website with downloadable datasets that are used in the book;
In-depth examination of essential R packages.

Perfect for undergraduate and postgraduate students of science, engineering, medicine economics, and geography, The R Book will also earn a place in the libraries of social sciences professionals.

Cover
Title Page
Copyright
Contents
List of Tables
Preface
Acknowledgments
About the Companion Website
Chapter 1 Getting Started
	1.1 Navigating the book
		1.1.1 How to use this book
	1.2 R vs. RStudio
	1.3 Installing R and RStudio
	1.4 Using RStudio
		1.4.1 Using R directly via the console
		1.4.2 Using text editors
	1.5 The Comprehensive R Archive Network
		1.5.1 Manuals
		1.5.2 Frequently asked questions
		1.5.3 Contributed documentation
	1.6 Packages in R
		1.6.1 Contents of packages
		1.6.2 Finding packages
		1.6.3 Installing packages
	1.7 Getting help in R
		1.7.1 Worked examples of functions
		1.7.2 Demonstrations of R functions
	1.8 Good housekeeping
		1.8.1 Variable types
		1.8.2 What's loaded or defined in the current session
		1.8.3 Attaching and detaching objects
		1.8.4 Projects
	1.9 Linking to other computer languages
	1.9 References
Chapter 2 Technical Background
	2.1 Mathematical functions
		2.1.1 Logarithms and exponentials
		2.1.2 Trigonometric functions
		2.1.3 Power laws
		2.1.4 Polynomial functions
		2.1.5 Gamma function
		2.1.6 Asymptotic functions
		2.1.7 Sigmoid (S‐shaped) functions
		2.1.8 Biexponential function
		2.1.9 Transformations of model variables
	2.2 Matrices
		2.2.1 Matrix multiplication
		2.2.2 Diagonals of matrices
		2.2.3 Determinants
		2.2.4 Inverse of a matrix
		2.2.5 Eigenvalues and eigenvectors
		2.2.6 Solving systems of linear equations using matrices
	2.3 Calculus
		2.3.1 Differentiation
		2.3.2 Integration
		2.3.3 Differential equations
	2.4 Probability
		2.4.1 The central limit theorem
		2.4.2 Conditional probability
	2.5 Statistics
		2.5.1 Least squares
		2.5.2 Maximum likelihood
	2.5 Reference
Chapter 3 Essentials of the R Language
	3.1 Calculations
		3.1.1 Complex numbers
		3.1.2 Rounding
		3.1.3 Arithmetic
		3.1.4 Modular arithmetic
		3.1.5 Operators
		3.1.6 Integers
	3.2 Naming objects
	3.3 Factors
	3.4 Logical operations
		3.4.1 TRUE, T, FALSE, F
		3.4.2 Testing for equality of real numbers
		3.4.3 Testing for equality of non‐numeric objects
		3.4.4 Evaluation of combinations of TRUE and FALSE
		3.4.5 Logical arithmetic
	3.5 Generating sequences
		3.5.1 Generating repeats
		3.5.2 Generating factor levels
	3.6 Class membership
	3.7 Missing values, infinity, and things that are not numbers
		3.7.1 Missing values: NA
	3.8 Vectors and subscripts
		3.8.1 Extracting elements of a vector using subscripts
		3.8.2 Classes of vector
		3.8.3 Naming elements within vectors
	3.9 Working with logical subscripts
	3.10 Vector functions
		3.10.1 Obtaining tables using tapply ()
		3.10.2 Applying functions to vectors using sapply ()
		3.10.3 The aggregate () function for grouped summary statistics
		3.10.4 Parallel minima and maxima: pmin and pmax
		3.10.5 Finding closest values
		3.10.6 Sorting, ranking, and ordering
		3.10.7 Understanding the difference between unique () and duplicated ()
		3.10.8 Looking for runs of numbers within vectors
		3.10.9 Sets: union (), intersect (), and setdiff ()
	3.11 Matrices and arrays
		3.11.1 Matrices
		3.11.2 Naming the rows and columns of matrices
		3.11.3 Calculations on rows or columns of matrices
		3.11.4 Adding rows and columns to matrices
		3.11.5 The sweep () function
		3.11.6 Applying functions to matrices
		3.11.7 Scaling a matrix
		3.11.8 Using the max.col () function
		3.11.9 Restructuring a multi‐dimensional array using aperm ()
	3.12 Random numbers, sampling, and shuffling
		3.12.1 The sample () function
	3.13 Loops and repeats
		3.13.1 More complicated while () loops
		3.13.2 Loop avoidance
		3.13.3 The slowness of loops
		3.13.4 Do not ‘grow’ data sets by concatenation or recursive function calls
		3.13.5 Loops for producing time series
	3.14 Lists
		3.14.1 Summarising lists and lapply ()
		3.14.2 Manipulating and saving lists
	3.15 Text, character strings, and pattern matching
		3.15.1 Pasting character strings together
		3.15.2 Extracting parts of strings
		3.15.3 Counting things within strings
		3.15.4 Upper and lower case text
		3.15.5 The match () function and relational databases
		3.15.6 Pattern matching
		3.15.7 Substituting text within character strings
		3.15.8 Locations of a pattern within a vector
		3.15.9 Comparing vectors using %in% and which ()
		3.15.10 Stripping patterned text out of complex strings
	3.16 Dates and times in R
		3.16.1 Reading time data from files
		3.16.2 Calculations with dates and times
		3.16.3 Generating sequences of dates
		3.16.4 Calculating time differences between the rows of a dataframe
		3.16.5 Regression using dates and times
	3.17 Environments
		3.17.1 Using attach () or not!
		3.17.2 Using attach () in this book
	3.18 Writing R functions
		3.18.1 Arithmetic mean of a single sample
		3.18.2 Median of a single sample
		3.18.3 Geometric mean
		3.18.4 Harmonic mean
		3.18.5 Variance
		3.18.6 Variance ratio test
		3.18.7 Using the variance
		3.18.8 Plots and deparsing in functions
		3.18.9 The switch () function
		3.18.10 Arguments in our function
		3.18.11 Errors in our functions
		3.18.12 Outputs from our function
	3.19 Structure of R objects
	3.20 Writing from R to a file
		3.20.1 Saving data objects
		3.20.2 Saving command history
		3.20.3 Saving graphics or plots
		3.20.4 Saving data for a spreadsheet
		3.20.5 Saving output from functions to a file
	3.21 Tips for writing R code
	3.21 References
Chapter 4 Data Input and Dataframes
	4.1 Working directory
	4.2 Data input from files
		4.2.1 Data input using read.table () and read.csv ()
		4.2.2 Input from files using scan ()
		4.2.3 Reading data from a file using readLines ()
	4.3 Data input directly from the web
	4.4 Built‐in data files
	4.5 Dataframes
		4.5.1 Subscripts and indices
		4.5.2 Selecting rows from the dataframe at random
		4.5.3 Sorting dataframes
		4.5.4 Using logical conditions to select rows from the dataframe
		4.5.5 Omitting rows containing missing values, NA
		4.5.6 A dataframe with row names instead of row numbers
		4.5.7 Creating a dataframe from another kind of object
		4.5.8 Eliminating duplicate rows from a dataframe
		4.5.9 Dates in dataframes
	4.6 Using the match () function in dataframes
		4.6.1 Merging two dataframes
	4.7 Adding margins to a dataframe
		4.7.1 Summarising the contents of dataframes
Chapter 5 Graphics
	5.1 Plotting principles
		5.1.1 Axes labels and titles
		5.1.2 Plotting symbols and colours
		5.1.3 Saving graphics
	5.2 Plots for single variables
		5.2.1 Histograms vs. bar charts
		5.2.2 Histograms
		5.2.3 Density plots
		5.2.4 Boxplots
		5.2.5 Dotplots
		5.2.6 Bar charts
		5.2.7 Pie charts
	5.3 Plots for showing two numeric variables
		5.3.1 Scatterplot
		5.3.2 Plots with many identical values
	5.4 Plots for numeric variables by group
		5.4.1 Boxplots by group
		5.4.2 Dotplots by group
		5.4.3 An inferior (but popular) option
	5.5 Plots showing two categorical variables
		5.5.1 Grouped bar charts
		5.5.2 Mosaic plots
	5.6 Plots for three (or more) variables
		5.6.1 Plots of all pairs of variables
		5.6.2 Incorporating a third variable on a scatterplot
		5.6.3 Basic 3D plots
	5.7 Trellis graphics
		5.7.1 Panel boxplots
		5.7.2 Panel scatterplots
		5.7.3 Panel barplots
		5.7.4 Panels for conditioning plots
		5.7.5 Panel histograms
		5.7.6 More panel functions
	5.8 Plotting functions
		5.8.1 Two‐dimensional plots
		5.8.2 Three‐dimensional plots
	5.8 References
Chapter 6 Graphics in More Detail
	6.1 More on colour
		6.1.1 Colour palettes with categorical data
		6.1.2 The RColorBrewer package
		6.1.3 Foreground colours
		6.1.4 Background colours
		6.1.5 Background colour for legends
		6.1.6 Different colours for different parts of the graph
		6.1.7 Full control of colours in plots
		6.1.8 Cross‐hatching and grey scale
	6.2 Changing the look of graphics
		6.2.1 Shape and size of plot
		6.2.2 Multiple plots on one screen
		6.2.3 Tickmarks and associated labels
		6.2.4 Font of text
	6.3 Adding items to plots
		6.3.1 Adding text
		6.3.2 Adding smooth parametric curves to a scatterplot
		6.3.3 Fitting non‐parametric curves through a scatterplot
		6.3.4 Connecting observations
		6.3.5 Adding shapes
		6.3.6 Adding mathematical and other symbols
	6.4 The grammar of graphics and ggplot2
		6.4.1 Basic structure
		6.4.2 Examples
	6.5 Graphics cheat sheet
		6.5.1 Text justification, adj
		6.5.2 Annotation of graphs, ann
		6.5.3 Delay moving on to the next in a series of plots, ask
		6.5.4 Control over the axes, axis
		6.5.5 Background colour for plots, bg
		6.5.6 Boxes around plots, bty
		6.5.7 Size of plotting symbols using the character expansion function, cex
		6.5.8 Changing the shape of the plotting region, plt
		6.5.9 Locating multiple graphs in non‐standard layouts using fig
		6.5.10 Two graphs with a common X scale but different Y scales using fig
		6.5.11 The layout function
		6.5.12 Creating and controlling multiple screens on a single device
		6.5.13 Orientation of numbers on the tick marks, las
		6.5.14 Shapes for the ends and joins of lines, lend and ljoin
		6.5.15 Line types, lty
		6.5.16 Line widths, lwd
		6.5.17 Several graphs on the same page, mfrow and mfcol
		6.5.18 Margins around the plotting area, mar
		6.5.19 Plotting more than one graph on the same axes, new
		6.5.20 Outer margins, oma
		6.5.21 Packing graphs closer together
		6.5.22 Square plotting region, pty
		6.5.23 Character rotation, srt
		6.5.24 Rotating the axis labels
		6.5.25 Tick marks on the axes
		6.5.26 Axis styles
		6.5.27 Summary
	6.5 References
Chapter 7 Tables
	7.1 Tabulating categorical or discrete data
		7.1.1 Tables of counts
		7.1.2 Tables of proportions
	7.2 Tabulating summaries of numeric data
		7.2.1 General summaries by group
		7.2.2 Bespoke summaries by group
	7.3 Converting between tables and dataframes
		7.3.1 From a table to a dataframe
		7.3.2 From a dataframe to a table
	7.3 Reference
Chapter 8 Probability Distributions in R
	8.1 Probability distributions: the basics
		8.1.1 Discrete and continuous probability distributions
		8.1.2 Describing probability distributions mathematically
		8.1.3 Independence
	8.2 Probability distributions in R
	8.3 Continuous probability distributions
		8.3.1 The Normal (or Gaussian) distribution
		8.3.2 The Uniform distribution
		8.3.3 The Chi‐squared distribution
		8.3.4 The F distribution
		8.3.5 Student's t distribution
		8.3.6 The Gamma distribution
		8.3.7 The Exponential distribution
		8.3.8 The Beta distribution
		8.3.9 The Lognormal distribution
		8.3.10 The Logistic distribution
		8.3.11 The Weibull distribution
		8.3.12 Multivariate Normal distribution
	8.4 Discrete probability distributions
		8.4.1 The Bernoulli distribution
		8.4.2 The Binomial distribution
		8.4.3 The Geometric distribution
		8.4.4 The Hypergeometric distribution
		8.4.5 The Multinomial distribution
		8.4.6 The Poisson distribution
		8.4.7 The Negative Binomial distribution
	8.5 The central limit theorem
	8.5 References
Chapter 9 Testing
	9.1 Principles
		9.1.1 Defining the question to be tested
		9.1.2 Assumptions
		9.1.3 Interpreting results
	9.2 Continuous data
		9.2.1 Single population average
		9.2.2 Two population averages
		9.2.3 Multiple population averages
		9.2.4 Population distribution
		9.2.5 Checking and testing for normality
		9.2.6 Comparing variances
	9.3 Discrete and categorical data
		9.3.1 Sign test
		9.3.2 Test to compare proportions
		9.3.3 Contingency tables
		9.3.4 Testing contingency tables
	9.4 Bootstrapping
	9.5 Multiple tests
	9.6 Power and sample size calculations
	9.7 A table of tests
	9.7 References
Chapter 10 Regression
	10.1 The simple linear regression model
		10.1.1 Model format and assumptions
		10.1.2 Building a simple linear regression model
	10.2 The multiple linear regression model
		10.2.1 Model format and assumptions
		10.2.2 Building a multiple linear regression model
		10.2.3 Categorical covariates
		10.2.4 Interactions between covariates
	10.3 Understanding the output
		10.3.1 Residuals
		10.3.2 Estimates of coefficients
		10.3.3 Testing individual coefficients
		10.3.4 Residual standard error
		10.3.5 R2 and its variants
		10.3.6 The regression F‐test
		10.3.7 ANOVA: Same model, different output
		10.3.8 Extracting model information
	10.4 Fitting models
		10.4.1 The principle of parsimony
		10.4.2 First plot the data
		10.4.3 Comparing nested models
		10.4.4 Comparing non‐nested models
		10.4.5 Dealing with large numbers of covariates
	10.5 Checking model assumptions
		10.5.1 Residuals and standardised residuals
		10.5.2 Checking for linearity
		10.5.3 Checking for homoscedasticity of errors
		10.5.4 Checking for normality of errors
		10.5.5 Checking for independence of errors
		10.5.6 Checking for influential observations
		10.5.7 Checking for collinearity
		10.5.8 Improving fit
	10.6 Using the model
		10.6.1 Interpretation of model
		10.6.2 Making predictions
	10.7 Further types of regression modelling
	10.7 References
Chapter 11 Generalised Linear Models
	11.1 How GLMs work
		11.1.1 Error structure
		11.1.2 Linear predictor
		11.1.3 Link function
		11.1.4 Model checking
		11.1.5 Interpretation and prediction
	11.2 Count data and GLMs
		11.2.1 A straightforward example
		11.2.2 Dispersion
		11.2.3 An alternative to Poisson counts
	11.3 Count table data and GLMs
		11.3.1 Log‐linear models
		11.3.2 All covariates might be useful
		11.3.3 Spine plot
	11.4 Proportion data and GLMs
		11.4.1 Theoretical background
		11.4.2 Logistic regression with binomial errors
		11.4.3 Predicting x from y
		11.4.4 Proportion data with categorical explanatory variables
		11.4.5 Binomial GLM with ordered categorical covariates
		11.4.6 Binomial GLM with categorical and continuous covariates
		11.4.7 Revisiting lizards
	11.5 Binary Response Variables and GLMs
		11.5.1 A straightforward example
		11.5.2 Graphical tests of the fit of the logistic curve to data
		11.5.3 Mixed covariate types with a binary response
		11.5.4 Spine plot and logistic regression
	11.6 Bootstrapping a GLM
	11.6 References
Chapter 12 Generalised Additive Models
	12.1 Smoothing example
	12.2 Straightforward examples of GAMs
	12.3 Background to using GAMs
		12.3.1 Smoothing
		12.3.2 Suggestions for using gam ()
	12.4 More complex GAM examples
		12.4.1 Back to Ozone
		12.4.2 An example with strongly humped data
		12.4.3 GAMs with binary data
		12.4.4 Three‐dimensional graphic output from gam
	12.4 References
Chapter 13 Mixed‐Effect Models
	13.1 Regression with categorical covariates
	13.2 An alternative method: random effects
	13.3 Common data structures where random effects are useful
		13.3.1 Nested (hierarchical) structures
		13.3.2 Non‐nested structures
		13.3.3 Longitudinal structures
	13.4 R packages to deal with mixed effects models
		13.4.1 The nlme package
		13.4.2 The lme4 package
		13.4.3 Methods for fitting mixed models
	13.5 Examples of implementing random effect models
		13.5.1 Multilevel data (two levels)
		13.5.2 Multilevel data (three levels)
		13.5.3 Designed experiment: split‐plot
		13.5.4 Longitudinal data
	13.6 Generalised linear mixed models
		13.6.1 Logistic mixed model
	13.7 Alternatives to mixed models
	13.7 References
Chapter 14 Non‐linear Regression
	14.1 Example: modelling deer jaw bone length
		14.1.1 An exponential model for the deer data
		14.1.2 A Michaelis–Menten model for the deer data
		14.1.3 Comparison of the exponential and the Michaelis–Menten model
	14.2 Example: grouped data
	14.3 Self‐starting functions
		14.3.1 Self‐starting Michaelis–Menten model
		14.3.2 Self‐starting asymptotic exponential model
		14.3.3 Self‐starting logistic
		14.3.4 Self‐starting four‐parameter logistic
	14.4 Further considerations
		14.4.1 Model checking
		14.4.2 Confidence intervals
	14.4 References
Chapter 15 Survival Analysis
	15.1 Handling survival data
		15.1.1 Structure of a survival dataset
		15.1.2 Survival data in R
	15.2 The survival and hazard functions
		15.2.1 Non‐parametric estimation of the survival function
		15.2.2 Parametric estimation of the survival function
	15.3 Modelling survival data
		15.3.1 The data
		15.3.2 The Cox proportional hazard model
		15.3.3 Accelerated failure time models
		15.3.4 Cox proportional hazard or a parametric model?
	15.3 References
Chapter 16 Designed Experiments
	16.1 Factorial experiments
		16.1.1 Expanding data
	16.2 Pseudo‐replication
		16.2.1 Split‐plot effects
		16.2.2 Removing pseudo‐replication
		16.2.3 Derived variable analysis
	16.3 Contrasts
		16.3.1 Contrast coefficients
		16.3.2 An example of contrasts using R
		16.3.3 Model simplification for contrasts
		16.3.4 Helmert contrasts
		16.3.5 Sum contrasts
		16.3.6 Polynomial contrasts
		16.3.7 Contrasts with multiple covariates
	16.3 References
Chapter 17 Meta‐Analysis
	17.1 Elements of a meta‐analysis
		17.1.1 Choosing studies for a meta‐analysis
		17.1.2 Effects and effect size
		17.1.3 Weights
		17.1.4 Fixed vs. random effect models
	17.2 Meta‐analysis in R
		17.2.1 Formatting information from studies
		17.2.2 Computing the inputs of a meta‐analysis
		17.2.3 Conducting the meta‐analysis
	17.3 Examples
		17.3.1 Meta‐analysis Of scaled differences
	17.4 Meta‐analysis of categorical data
	17.4 References
Chapter 18 Time Series
	18.1 Moving average
	18.2 Blowflies
	18.3 Seasonal data
		18.3.1 Point of view
		18.3.2 Built in ts () functions
		18.3.3 Cycles
		18.3.4 Testing for a time series trend
	18.4 Multiple time series
	18.5 Some theoretical background
		18.5.1 Autocorrelation
		18.5.2 Autoregressive models
		18.5.3 Partial autocorrelation
		18.5.4 Moving average models
		18.5.5 More general models: ARMA and ARIMA
	18.6 ARIMA example
	18.7 Simulation of time series
	18.7 Reference
Chapter 19 Multivariate Statistics
	19.1 Visualising data
	19.2 Multivariate analysis of variance
	19.3 Principal component analysis
	19.4 Factor analysis
	19.5 Cluster analysis
		19.5.1 k‐means
	19.6 Hierarchical cluster analysis
	19.7 Discriminant analysis
	19.8 Neural networks
	19.8 References
Chapter 20 Classification and Regression Trees
	20.1 How CARTs work
	20.2 Regression trees
		20.2.1 The tree package
		20.2.2 The rpart package
		20.2.3 Comparison with linear regression
		20.2.4 Model simplification
	20.3 Classification trees
		20.3.1 Classification trees with categorical explanatory variables
		20.3.2 Classification trees for replicated data
	20.4 Looking for patterns
	20.4 References
Chapter 21 Spatial Statistics
	21.1 Spatial point processes
		21.1.1 How can we check for randomness?
		21.1.2 Models
		21.1.3 Marks
	21.2 Geospatial statistics
		21.2.1 Models
	21.2 References
Chapter 22 Bayesian Statistics
	22.1 Components of a Bayesian Analysis
		22.1.1 The likelihood (the model and data)
		22.1.2 Priors
		22.1.3 The Posterior
		22.1.4 Markov chain Monte Carlo (MCMC)
		22.1.5 Considerations for MCMC
		22.1.6 Inference
		22.1.7 The Pros and Cons of going Bayesian
	22.2 Bayesian analysis in R
		22.2.1 Installing JAGS
		22.2.2 Running JAGS in R
		22.2.3 Writing BUGS models
	22.3 Examples
		22.3.1 MCMC for a simple linear regression
		22.3.2 MCMC for longitudinal data
	22.4 MCMC for a model with binomial errors
	22.4 References
Chapter 23 Simulation Models
	23.1 Temporal dynamics
		23.1.1 Chaotic dynamics in population size
		23.1.2 Investigating the route to chaos
	23.2 Spatial simulation models
		23.2.1 Meta‐population dynamics
		23.2.2 Coexistence resulting from spatially explicit (local) density dependence
		23.2.3 Pattern generation resulting from dynamic interactions
	23.3 Temporal and spatial dynamics: random walk
	23.3 References
Index
EULA

Mathematical & Statistical