Probability, Statistics, and Data: A Fresh Approach Using R
- Length: 512 pages
- Edition: 1
- Language: English
- Publisher: Chapman and Hall/CRC
- Publication Date: 2021-11-26
- ISBN-10: 0367436671
- ISBN-13: 9780367436674
- Sales Rank: #6801024 (See Top 100 Books)
This book is a fresh approach to a calculus based, first course in probability and statistics, using R throughout to give a central role to data and simulation.
The book introduces probability with Monte Carlo simulation as an essential tool. Simulation makes challenging probability questions quickly accessible and easily understandable. Mathematical approaches are included, using calculus when appropriate, but are always connected to experimental computations.
Using R and simulation gives a nuanced understanding of statistical inference. The impact of departure from assumptions in statistical tests is emphasized, quantified using simulations, and demonstrated with real data. The book compares parametric and non-parametric methods through simulation, allowing for a thorough investigation of testing error and power. The text builds R skills from the outset, allowing modern methods of resampling and cross validation to be introduced along with traditional statistical techniques.
Fifty-two data sets are included in the complementary R package fosdata. Most of these data sets are from recently published papers, so that you are working with current, real data, which is often large and messy. Two central chapters use powerful tidyverse tools (dplyr, ggplot2, tidyr, stringr) to wrangle data and produce meaningful visualizations. Preliminary versions of the book have been used for five semesters at Saint Louis University, and the majority of the more than 400 exercises have been classroom tested.
Cover Page Half-Title Page Title Page Copyright Page Contents Preface Software Installation 1 Data in R 1.1 Arithmetic and variable assignment 1.2 Help 1.3 Vectors 1.4 Indexing vectors 1.5 Data types 1.6 Data frames 1.7 Reading data from files 1.8 Packages 1.9 Errors and warnings 1.10 Useful idioms Vignette: Data science communities Vignette: An R Markdown primer Exercises 2 Probability 2.1 Probability basics 2.2 Simulations 2.3 Conditional probability and independence 2.4 Counting arguments Vignette: Negative surveys Exercises 3 Discrete Random Variables 3.1 Probability mass functions 3.2 Expected value 3.3 Binomial and geometric random variables 3.4 Functions of a random variable 3.5 Variance, standard deviation, and independence 3.6 Poisson, negative binomial, and hypergeometric Vignette: Loops in R Exercises 4 Continuous Random Variables 4.1 Probability density functions 4.2 Expected value 4.3 Variance and standard deviation 4.4 Normal random variables 4.5 Uniform and exponential random variables 4.6 Summary Exercises 5 Simulation of Random Variables 5.1 Estimating probabilities 5.2 Estimating discrete distributions 5.3 Estimating continuous distributions 5.4 Central Limit Theorem 5.5 Sampling distributions 5.6 Point estimators Vignette: Stein's paradox Exercises 6 Data Manipulation 6.1 Data frames and tibbles 6.2 dplyr verbs 6.3 dplyr pipelines 6.4 The power of dplyr 6.5 Working with character strings 6.6 Structure of data 6.7 The apply family Vignette: dplyr murder mystery Vignette: Data and gender Exercises 7 Data Visualization with ggplot 7.1 ggplot fundamentals 7.2 Visualizing a single variable 7.3 Visualizing two or more variables 7.4 Customizing Vignette: Choropleth maps Vignette: COVID-19 Exercises 8 Inference on the Mean 8.1 Sampling distribution of the sample mean 8.2 Confidence intervals for the mean 8.3 Hypothesis tests of the mean 8.4 One-sided confidence intervals and hypothesis tests 8.5 Assessing robustness via simulation 8.6 Two sample hypothesis tests 8.7 Type II errors and power 8.8 Resampling Exercises 9 Rank Based Tests 9.1 One sample Wilcoxon signed rank test 9.2 Two sample Wilcoxon tests 9.3 Power and sample size 9.4 Effect size and consistency 9.5 Summary Vignette: ROC curves and the Wilcoxon rank sum statistic Exercises 10 Tabular Data 10.1 Tables and plots 10.2 Inference on a proportion 10.3 χ2 tests 10.4 χ2 goodness of fit 10.5 χ2 tests on cross tables 10.6 Exact and Monte Carlo methods Vignette: Tables Exercises 11 Simple Linear Regression 11.1 Least squares regression line 11.2 Correlation 11.3 Geometry of regression 11.4 Residual analysis 11.5 Inference 11.6 Simulations for simple linear regression 11.7 Cross validation 11.8 Bias-variance tradeoff Vignette: Simple logistic regression Exercises 12 Analysis of Variance and Comparison of Multiple Groups 12.1 ANOVA 12.2 The ANOVA test 12.3 Unequal variance 12.4 Pairwise t-tests Vignette: Reproducibility Exercises 13 Multiple Regression 13.1 Two explanatory variables 13.2 Categorical variables 13.3 Variable selection Vignette: External data formats Exercises Image Credits Index Index of Data Sets and Packages
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.