A start-to-finish guide to one of the most useful programming languages for researchers in a variety of fields
In the newly revised Third Edition of The R Book, a team of distinguished teachers and researchers delivers a user-friendly and comprehensive discussion of foundational and advanced topics in the R software language, which is used widely in science, engineering, medicine, economics, and other fields. The book is designed to be used as both a complete text—readable from cover to cover—and as a reference manual for practitioners seeking authoritative guidance on particular topics.
This latest edition offers instruction on the use of the RStudio GUI, an easy-to-use environment for those new to R. It provides readers with a complete walkthrough of the R language, beginning at a point that assumes no prior knowledge of R and very little previous knowledge of statistics. Readers will also find:
- A thorough introduction to fundamental concepts in statistics and step-by-step roadmaps to their implementation in R;
- Comprehensive explorations of worked examples in R;
- A complementary companion website with downloadable datasets that are used in the book;
- In-depth examination of essential R packages.
Perfect for undergraduate and postgraduate students of science, engineering, medicine economics, and geography, The R Book will also earn a place in the libraries of social sciences professionals.
Cover Title Page Copyright Contents List of Tables Preface Acknowledgments About the Companion Website Chapter 1 Getting Started 1.1 Navigating the book 1.1.1 How to use this book 1.2 R vs. RStudio 1.3 Installing R and RStudio 1.4 Using RStudio 1.4.1 Using R directly via the console 1.4.2 Using text editors 1.5 The Comprehensive R Archive Network 1.5.1 Manuals 1.5.2 Frequently asked questions 1.5.3 Contributed documentation 1.6 Packages in R 1.6.1 Contents of packages 1.6.2 Finding packages 1.6.3 Installing packages 1.7 Getting help in R 1.7.1 Worked examples of functions 1.7.2 Demonstrations of R functions 1.8 Good housekeeping 1.8.1 Variable types 1.8.2 What's loaded or defined in the current session 1.8.3 Attaching and detaching objects 1.8.4 Projects 1.9 Linking to other computer languages 1.9 References Chapter 2 Technical Background 2.1 Mathematical functions 2.1.1 Logarithms and exponentials 2.1.2 Trigonometric functions 2.1.3 Power laws 2.1.4 Polynomial functions 2.1.5 Gamma function 2.1.6 Asymptotic functions 2.1.7 Sigmoid (S‐shaped) functions 2.1.8 Biexponential function 2.1.9 Transformations of model variables 2.2 Matrices 2.2.1 Matrix multiplication 2.2.2 Diagonals of matrices 2.2.3 Determinants 2.2.4 Inverse of a matrix 2.2.5 Eigenvalues and eigenvectors 2.2.6 Solving systems of linear equations using matrices 2.3 Calculus 2.3.1 Differentiation 2.3.2 Integration 2.3.3 Differential equations 2.4 Probability 2.4.1 The central limit theorem 2.4.2 Conditional probability 2.5 Statistics 2.5.1 Least squares 2.5.2 Maximum likelihood 2.5 Reference Chapter 3 Essentials of the R Language 3.1 Calculations 3.1.1 Complex numbers 3.1.2 Rounding 3.1.3 Arithmetic 3.1.4 Modular arithmetic 3.1.5 Operators 3.1.6 Integers 3.2 Naming objects 3.3 Factors 3.4 Logical operations 3.4.1 TRUE, T, FALSE, F 3.4.2 Testing for equality of real numbers 3.4.3 Testing for equality of non‐numeric objects 3.4.4 Evaluation of combinations of TRUE and FALSE 3.4.5 Logical arithmetic 3.5 Generating sequences 3.5.1 Generating repeats 3.5.2 Generating factor levels 3.6 Class membership 3.7 Missing values, infinity, and things that are not numbers 3.7.1 Missing values: NA 3.8 Vectors and subscripts 3.8.1 Extracting elements of a vector using subscripts 3.8.2 Classes of vector 3.8.3 Naming elements within vectors 3.9 Working with logical subscripts 3.10 Vector functions 3.10.1 Obtaining tables using tapply () 3.10.2 Applying functions to vectors using sapply () 3.10.3 The aggregate () function for grouped summary statistics 3.10.4 Parallel minima and maxima: pmin and pmax 3.10.5 Finding closest values 3.10.6 Sorting, ranking, and ordering 3.10.7 Understanding the difference between unique () and duplicated () 3.10.8 Looking for runs of numbers within vectors 3.10.9 Sets: union (), intersect (), and setdiff () 3.11 Matrices and arrays 3.11.1 Matrices 3.11.2 Naming the rows and columns of matrices 3.11.3 Calculations on rows or columns of matrices 3.11.4 Adding rows and columns to matrices 3.11.5 The sweep () function 3.11.6 Applying functions to matrices 3.11.7 Scaling a matrix 3.11.8 Using the max.col () function 3.11.9 Restructuring a multi‐dimensional array using aperm () 3.12 Random numbers, sampling, and shuffling 3.12.1 The sample () function 3.13 Loops and repeats 3.13.1 More complicated while () loops 3.13.2 Loop avoidance 3.13.3 The slowness of loops 3.13.4 Do not ‘grow’ data sets by concatenation or recursive function calls 3.13.5 Loops for producing time series 3.14 Lists 3.14.1 Summarising lists and lapply () 3.14.2 Manipulating and saving lists 3.15 Text, character strings, and pattern matching 3.15.1 Pasting character strings together 3.15.2 Extracting parts of strings 3.15.3 Counting things within strings 3.15.4 Upper and lower case text 3.15.5 The match () function and relational databases 3.15.6 Pattern matching 3.15.7 Substituting text within character strings 3.15.8 Locations of a pattern within a vector 3.15.9 Comparing vectors using %in% and which () 3.15.10 Stripping patterned text out of complex strings 3.16 Dates and times in R 3.16.1 Reading time data from files 3.16.2 Calculations with dates and times 3.16.3 Generating sequences of dates 3.16.4 Calculating time differences between the rows of a dataframe 3.16.5 Regression using dates and times 3.17 Environments 3.17.1 Using attach () or not! 3.17.2 Using attach () in this book 3.18 Writing R functions 3.18.1 Arithmetic mean of a single sample 3.18.2 Median of a single sample 3.18.3 Geometric mean 3.18.4 Harmonic mean 3.18.5 Variance 3.18.6 Variance ratio test 3.18.7 Using the variance 3.18.8 Plots and deparsing in functions 3.18.9 The switch () function 3.18.10 Arguments in our function 3.18.11 Errors in our functions 3.18.12 Outputs from our function 3.19 Structure of R objects 3.20 Writing from R to a file 3.20.1 Saving data objects 3.20.2 Saving command history 3.20.3 Saving graphics or plots 3.20.4 Saving data for a spreadsheet 3.20.5 Saving output from functions to a file 3.21 Tips for writing R code 3.21 References Chapter 4 Data Input and Dataframes 4.1 Working directory 4.2 Data input from files 4.2.1 Data input using read.table () and read.csv () 4.2.2 Input from files using scan () 4.2.3 Reading data from a file using readLines () 4.3 Data input directly from the web 4.4 Built‐in data files 4.5 Dataframes 4.5.1 Subscripts and indices 4.5.2 Selecting rows from the dataframe at random 4.5.3 Sorting dataframes 4.5.4 Using logical conditions to select rows from the dataframe 4.5.5 Omitting rows containing missing values, NA 4.5.6 A dataframe with row names instead of row numbers 4.5.7 Creating a dataframe from another kind of object 4.5.8 Eliminating duplicate rows from a dataframe 4.5.9 Dates in dataframes 4.6 Using the match () function in dataframes 4.6.1 Merging two dataframes 4.7 Adding margins to a dataframe 4.7.1 Summarising the contents of dataframes Chapter 5 Graphics 5.1 Plotting principles 5.1.1 Axes labels and titles 5.1.2 Plotting symbols and colours 5.1.3 Saving graphics 5.2 Plots for single variables 5.2.1 Histograms vs. bar charts 5.2.2 Histograms 5.2.3 Density plots 5.2.4 Boxplots 5.2.5 Dotplots 5.2.6 Bar charts 5.2.7 Pie charts 5.3 Plots for showing two numeric variables 5.3.1 Scatterplot 5.3.2 Plots with many identical values 5.4 Plots for numeric variables by group 5.4.1 Boxplots by group 5.4.2 Dotplots by group 5.4.3 An inferior (but popular) option 5.5 Plots showing two categorical variables 5.5.1 Grouped bar charts 5.5.2 Mosaic plots 5.6 Plots for three (or more) variables 5.6.1 Plots of all pairs of variables 5.6.2 Incorporating a third variable on a scatterplot 5.6.3 Basic 3D plots 5.7 Trellis graphics 5.7.1 Panel boxplots 5.7.2 Panel scatterplots 5.7.3 Panel barplots 5.7.4 Panels for conditioning plots 5.7.5 Panel histograms 5.7.6 More panel functions 5.8 Plotting functions 5.8.1 Two‐dimensional plots 5.8.2 Three‐dimensional plots 5.8 References Chapter 6 Graphics in More Detail 6.1 More on colour 6.1.1 Colour palettes with categorical data 6.1.2 The RColorBrewer package 6.1.3 Foreground colours 6.1.4 Background colours 6.1.5 Background colour for legends 6.1.6 Different colours for different parts of the graph 6.1.7 Full control of colours in plots 6.1.8 Cross‐hatching and grey scale 6.2 Changing the look of graphics 6.2.1 Shape and size of plot 6.2.2 Multiple plots on one screen 6.2.3 Tickmarks and associated labels 6.2.4 Font of text 6.3 Adding items to plots 6.3.1 Adding text 6.3.2 Adding smooth parametric curves to a scatterplot 6.3.3 Fitting non‐parametric curves through a scatterplot 6.3.4 Connecting observations 6.3.5 Adding shapes 6.3.6 Adding mathematical and other symbols 6.4 The grammar of graphics and ggplot2 6.4.1 Basic structure 6.4.2 Examples 6.5 Graphics cheat sheet 6.5.1 Text justification, adj 6.5.2 Annotation of graphs, ann 6.5.3 Delay moving on to the next in a series of plots, ask 6.5.4 Control over the axes, axis 6.5.5 Background colour for plots, bg 6.5.6 Boxes around plots, bty 6.5.7 Size of plotting symbols using the character expansion function, cex 6.5.8 Changing the shape of the plotting region, plt 6.5.9 Locating multiple graphs in non‐standard layouts using fig 6.5.10 Two graphs with a common X scale but different Y scales using fig 6.5.11 The layout function 6.5.12 Creating and controlling multiple screens on a single device 6.5.13 Orientation of numbers on the tick marks, las 6.5.14 Shapes for the ends and joins of lines, lend and ljoin 6.5.15 Line types, lty 6.5.16 Line widths, lwd 6.5.17 Several graphs on the same page, mfrow and mfcol 6.5.18 Margins around the plotting area, mar 6.5.19 Plotting more than one graph on the same axes, new 6.5.20 Outer margins, oma 6.5.21 Packing graphs closer together 6.5.22 Square plotting region, pty 6.5.23 Character rotation, srt 6.5.24 Rotating the axis labels 6.5.25 Tick marks on the axes 6.5.26 Axis styles 6.5.27 Summary 6.5 References Chapter 7 Tables 7.1 Tabulating categorical or discrete data 7.1.1 Tables of counts 7.1.2 Tables of proportions 7.2 Tabulating summaries of numeric data 7.2.1 General summaries by group 7.2.2 Bespoke summaries by group 7.3 Converting between tables and dataframes 7.3.1 From a table to a dataframe 7.3.2 From a dataframe to a table 7.3 Reference Chapter 8 Probability Distributions in R 8.1 Probability distributions: the basics 8.1.1 Discrete and continuous probability distributions 8.1.2 Describing probability distributions mathematically 8.1.3 Independence 8.2 Probability distributions in R 8.3 Continuous probability distributions 8.3.1 The Normal (or Gaussian) distribution 8.3.2 The Uniform distribution 8.3.3 The Chi‐squared distribution 8.3.4 The F distribution 8.3.5 Student's t distribution 8.3.6 The Gamma distribution 8.3.7 The Exponential distribution 8.3.8 The Beta distribution 8.3.9 The Lognormal distribution 8.3.10 The Logistic distribution 8.3.11 The Weibull distribution 8.3.12 Multivariate Normal distribution 8.4 Discrete probability distributions 8.4.1 The Bernoulli distribution 8.4.2 The Binomial distribution 8.4.3 The Geometric distribution 8.4.4 The Hypergeometric distribution 8.4.5 The Multinomial distribution 8.4.6 The Poisson distribution 8.4.7 The Negative Binomial distribution 8.5 The central limit theorem 8.5 References Chapter 9 Testing 9.1 Principles 9.1.1 Defining the question to be tested 9.1.2 Assumptions 9.1.3 Interpreting results 9.2 Continuous data 9.2.1 Single population average 9.2.2 Two population averages 9.2.3 Multiple population averages 9.2.4 Population distribution 9.2.5 Checking and testing for normality 9.2.6 Comparing variances 9.3 Discrete and categorical data 9.3.1 Sign test 9.3.2 Test to compare proportions 9.3.3 Contingency tables 9.3.4 Testing contingency tables 9.4 Bootstrapping 9.5 Multiple tests 9.6 Power and sample size calculations 9.7 A table of tests 9.7 References Chapter 10 Regression 10.1 The simple linear regression model 10.1.1 Model format and assumptions 10.1.2 Building a simple linear regression model 10.2 The multiple linear regression model 10.2.1 Model format and assumptions 10.2.2 Building a multiple linear regression model 10.2.3 Categorical covariates 10.2.4 Interactions between covariates 10.3 Understanding the output 10.3.1 Residuals 10.3.2 Estimates of coefficients 10.3.3 Testing individual coefficients 10.3.4 Residual standard error 10.3.5 R2 and its variants 10.3.6 The regression F‐test 10.3.7 ANOVA: Same model, different output 10.3.8 Extracting model information 10.4 Fitting models 10.4.1 The principle of parsimony 10.4.2 First plot the data 10.4.3 Comparing nested models 10.4.4 Comparing non‐nested models 10.4.5 Dealing with large numbers of covariates 10.5 Checking model assumptions 10.5.1 Residuals and standardised residuals 10.5.2 Checking for linearity 10.5.3 Checking for homoscedasticity of errors 10.5.4 Checking for normality of errors 10.5.5 Checking for independence of errors 10.5.6 Checking for influential observations 10.5.7 Checking for collinearity 10.5.8 Improving fit 10.6 Using the model 10.6.1 Interpretation of model 10.6.2 Making predictions 10.7 Further types of regression modelling 10.7 References Chapter 11 Generalised Linear Models 11.1 How GLMs work 11.1.1 Error structure 11.1.2 Linear predictor 11.1.3 Link function 11.1.4 Model checking 11.1.5 Interpretation and prediction 11.2 Count data and GLMs 11.2.1 A straightforward example 11.2.2 Dispersion 11.2.3 An alternative to Poisson counts 11.3 Count table data and GLMs 11.3.1 Log‐linear models 11.3.2 All covariates might be useful 11.3.3 Spine plot 11.4 Proportion data and GLMs 11.4.1 Theoretical background 11.4.2 Logistic regression with binomial errors 11.4.3 Predicting x from y 11.4.4 Proportion data with categorical explanatory variables 11.4.5 Binomial GLM with ordered categorical covariates 11.4.6 Binomial GLM with categorical and continuous covariates 11.4.7 Revisiting lizards 11.5 Binary Response Variables and GLMs 11.5.1 A straightforward example 11.5.2 Graphical tests of the fit of the logistic curve to data 11.5.3 Mixed covariate types with a binary response 11.5.4 Spine plot and logistic regression 11.6 Bootstrapping a GLM 11.6 References Chapter 12 Generalised Additive Models 12.1 Smoothing example 12.2 Straightforward examples of GAMs 12.3 Background to using GAMs 12.3.1 Smoothing 12.3.2 Suggestions for using gam () 12.4 More complex GAM examples 12.4.1 Back to Ozone 12.4.2 An example with strongly humped data 12.4.3 GAMs with binary data 12.4.4 Three‐dimensional graphic output from gam 12.4 References Chapter 13 Mixed‐Effect Models 13.1 Regression with categorical covariates 13.2 An alternative method: random effects 13.3 Common data structures where random effects are useful 13.3.1 Nested (hierarchical) structures 13.3.2 Non‐nested structures 13.3.3 Longitudinal structures 13.4 R packages to deal with mixed effects models 13.4.1 The nlme package 13.4.2 The lme4 package 13.4.3 Methods for fitting mixed models 13.5 Examples of implementing random effect models 13.5.1 Multilevel data (two levels) 13.5.2 Multilevel data (three levels) 13.5.3 Designed experiment: split‐plot 13.5.4 Longitudinal data 13.6 Generalised linear mixed models 13.6.1 Logistic mixed model 13.7 Alternatives to mixed models 13.7 References Chapter 14 Non‐linear Regression 14.1 Example: modelling deer jaw bone length 14.1.1 An exponential model for the deer data 14.1.2 A Michaelis–Menten model for the deer data 14.1.3 Comparison of the exponential and the Michaelis–Menten model 14.2 Example: grouped data 14.3 Self‐starting functions 14.3.1 Self‐starting Michaelis–Menten model 14.3.2 Self‐starting asymptotic exponential model 14.3.3 Self‐starting logistic 14.3.4 Self‐starting four‐parameter logistic 14.4 Further considerations 14.4.1 Model checking 14.4.2 Confidence intervals 14.4 References Chapter 15 Survival Analysis 15.1 Handling survival data 15.1.1 Structure of a survival dataset 15.1.2 Survival data in R 15.2 The survival and hazard functions 15.2.1 Non‐parametric estimation of the survival function 15.2.2 Parametric estimation of the survival function 15.3 Modelling survival data 15.3.1 The data 15.3.2 The Cox proportional hazard model 15.3.3 Accelerated failure time models 15.3.4 Cox proportional hazard or a parametric model? 15.3 References Chapter 16 Designed Experiments 16.1 Factorial experiments 16.1.1 Expanding data 16.2 Pseudo‐replication 16.2.1 Split‐plot effects 16.2.2 Removing pseudo‐replication 16.2.3 Derived variable analysis 16.3 Contrasts 16.3.1 Contrast coefficients 16.3.2 An example of contrasts using R 16.3.3 Model simplification for contrasts 16.3.4 Helmert contrasts 16.3.5 Sum contrasts 16.3.6 Polynomial contrasts 16.3.7 Contrasts with multiple covariates 16.3 References Chapter 17 Meta‐Analysis 17.1 Elements of a meta‐analysis 17.1.1 Choosing studies for a meta‐analysis 17.1.2 Effects and effect size 17.1.3 Weights 17.1.4 Fixed vs. random effect models 17.2 Meta‐analysis in R 17.2.1 Formatting information from studies 17.2.2 Computing the inputs of a meta‐analysis 17.2.3 Conducting the meta‐analysis 17.3 Examples 17.3.1 Meta‐analysis Of scaled differences 17.4 Meta‐analysis of categorical data 17.4 References Chapter 18 Time Series 18.1 Moving average 18.2 Blowflies 18.3 Seasonal data 18.3.1 Point of view 18.3.2 Built in ts () functions 18.3.3 Cycles 18.3.4 Testing for a time series trend 18.4 Multiple time series 18.5 Some theoretical background 18.5.1 Autocorrelation 18.5.2 Autoregressive models 18.5.3 Partial autocorrelation 18.5.4 Moving average models 18.5.5 More general models: ARMA and ARIMA 18.6 ARIMA example 18.7 Simulation of time series 18.7 Reference Chapter 19 Multivariate Statistics 19.1 Visualising data 19.2 Multivariate analysis of variance 19.3 Principal component analysis 19.4 Factor analysis 19.5 Cluster analysis 19.5.1 k‐means 19.6 Hierarchical cluster analysis 19.7 Discriminant analysis 19.8 Neural networks 19.8 References Chapter 20 Classification and Regression Trees 20.1 How CARTs work 20.2 Regression trees 20.2.1 The tree package 20.2.2 The rpart package 20.2.3 Comparison with linear regression 20.2.4 Model simplification 20.3 Classification trees 20.3.1 Classification trees with categorical explanatory variables 20.3.2 Classification trees for replicated data 20.4 Looking for patterns 20.4 References Chapter 21 Spatial Statistics 21.1 Spatial point processes 21.1.1 How can we check for randomness? 21.1.2 Models 21.1.3 Marks 21.2 Geospatial statistics 21.2.1 Models 21.2 References Chapter 22 Bayesian Statistics 22.1 Components of a Bayesian Analysis 22.1.1 The likelihood (the model and data) 22.1.2 Priors 22.1.3 The Posterior 22.1.4 Markov chain Monte Carlo (MCMC) 22.1.5 Considerations for MCMC 22.1.6 Inference 22.1.7 The Pros and Cons of going Bayesian 22.2 Bayesian analysis in R 22.2.1 Installing JAGS 22.2.2 Running JAGS in R 22.2.3 Writing BUGS models 22.3 Examples 22.3.1 MCMC for a simple linear regression 22.3.2 MCMC for longitudinal data 22.4 MCMC for a model with binomial errors 22.4 References Chapter 23 Simulation Models 23.1 Temporal dynamics 23.1.1 Chaotic dynamics in population size 23.1.2 Investigating the route to chaos 23.2 Spatial simulation models 23.2.1 Meta‐population dynamics 23.2.2 Coexistence resulting from spatially explicit (local) density dependence 23.2.3 Pattern generation resulting from dynamic interactions 23.3 Temporal and spatial dynamics: random walk 23.3 References Index EULA
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.