R for Health Data Science
- Length: 344 pages
- Edition: 1
- Language: English
- Publisher: Chapman and Hall/CRC
- Publication Date: 2020-11-17
- ISBN-10: 0367428199
- ISBN-13: 9780367428198
- Sales Rank: #482959 (See Top 100 Books)
In this age of information, the manipulation, analysis, and interpretation of data have become a fundamental part of professional life; nowhere more so than in the delivery of healthcare. From the understanding of disease and the development of new treatments, to the diagnosis and management of individual patients, the use of data and technology is now an integral part of the business of healthcare. Those working in healthcare interact daily with data, often without realising it. The conversion of this avalanche of information to useful knowledge is essential for high-quality patient care.
R for Health Data Science includes everything a healthcare professional needs to go from R novice to R guru. By the end of this book, you will be taking a sophisticated approach to health data science with beautiful visualisations, elegant tables, and nuanced analyses.
Features
- Provides an introduction to the fundamentals of R for healthcare professionals
- Highlights the most popular statistical approaches to health data science
- Written to be as accessible as possible with minimal mathematics
- Emphasises the importance of truly understanding the underlying data through the use of plots
- Includes numerous examples that can be adapted for your own data
- Helps you create publishable documents and collaborate across teams
With this book, you are in safe hands – Prof. Harrison is a clinician and Dr. Pius is a data scientist, bringing 25 years’ combined experience of using R at the coal face. This content has been taught to hundreds of individuals from a variety of backgrounds, from rank beginners to experts moving to R from other platforms.
Cover Half Title Title Page Copyright Page Dedication Preface About the Authors I Data wrangling and visualisation 1 Why we love R 1.1 Help, what's a script? 1.2 What is RStudio? 1.3 Getting started 1.4 Getting help 1.5 Work in a Project 1.6 Restart R regularly 1.7 Notation throughout this book 2 R basics 2.1 Reading data into R 2.1.1 Import Dataset interface 2.1.2 Reading in the Global Burden of Disease example dataset 2.2 Variable types and why we care 2.2.1 Numeric variables (continuous) 2.2.2 Character variables 2.2.3 Factor variables (categorical) 2.2.4 Date/time variables 2.3 Objects and functions 2.3.1 data frame/tibble 2.3.2 Naming objects 2.3.3 Function and its arguments 2.3.4 Working with objects 2.3.5 <- and = 2.3.6 Recap: object, function, input, argument 2.4 Pipe - %>% 2.4.1 Using. to direct the pipe 2.5 Operators for filtering data 2.5.1 Worked examples 2.6 The combine function: c() 2.7 Missing values (NAs) and filters 2.8 Creating new columns - mutate() 2.8.1 Worked example/exercise 2.9 Conditional calculations - if_else() 2.10 Create labels - paste() 2.11 Joining multiple datasets 2.11.1 Further notes about joins 3 Summarising data 3.1 Get the data 3.2 Plot the data 3.3 Aggregating: group_by(), summarise() 3.4 Add new columns: mutate() 3.4.1 Percentages formatting: percent() 3.5 summarise() vs mutate() 3.6 Common arithmetic functions - sum(), mean(), median(), etc. 3.7 select() columns 3.8 Reshaping data - long vs wide format 3.8.1 Pivot values from rows into columns (wider) 3.8.2 Pivot values from columns to rows (longer) 3.8.3 separate() a column into multiple columns 3.9 arrange() rows 3.9.1 Factor levels 3.10 Exercises 3.10.1 Exercise - pivot_wider() 3.10.2 Exercise - group_by(), summarise() 3.10.3 Exercise - full_join(), percent() 3.10.4 Exercise - mutate(), summarise() 3.10.5 Exercise - filter(), summarise(), pivot_wider() 4 Different types of plots 4.1 Get the data 4.2 Anatomy of ggplot explained 4.3 Set your theme - grey vs white 4.4 Scatter plots/bubble plots 4.5 Line plots/time series plots 4.5.1 Exercise 4.6 Bar plots 4.6.1 Summarised data 4.6.2 Countable data 4.6.3 colour vs fill 4.6.4 Proportions 4.6.5 Exercise 4.7 Histograms 4.8 Box plots 4.9 Multiple geoms, multiple aes() 4.9.1 Worked example - three geoms together 4.10 All other types of plots 4.11 Solutions 4.12 Extra: Advanced examples 5 Fine tuning plots 5.1 Get the data 5.2 Scales 5.2.1 Logarithmic 5.2.2 Expand limits 5.2.3 Zoom in 5.2.4 Exercise 5.2.5 Axis ticks 5.3 Colours 5.3.1 Using the Brewer palettes: 5.3.2 Legend title 5.3.3 Choosing colours manually 5.4 Titles and labels 5.4.1 Annotation 5.4.2 Annotation with a superscript and a variable 5.5 Overall look - theme() 5.5.1 Text size 5.5.2 Legend position 5.6 Saving your plot II Data analysis 6 Working with continuous outcome variables 6.1 Continuous data 6.2 The Question 6.3 Get and check the data 6.4 Plot the data 6.4.1 Histogram 6.4.2 Quantile-quantile (Q-Q) plot 6.4.3 Boxplot 6.5 Compare the means of two groups 6.5.1 t-test 6.5.2 Two-sample t-tests 6.5.3 Paired t-tests 6.5.4 What if I run the wrong test? 6.6 Compare the mean of one group: one sample t-tests 6.6.1 Interchangeability of t-tests 6.7 Compare the means of more than two groups 6.7.1 Plot the data 6.7.2 ANOVA 6.7.3 Assumptions 6.8 Multiple testing 6.8.1 Pairwise testing and multiple comparisons 6.9 Non-parametric tests 6.9.1 Transforming data 6.9.2 Non-parametric test for comparing two groups 6.9.3 Non-parametric test for comparing more than two groups 6.10 Finalfit approach 6.11 Conclusions 6.12 Exercises 6.12.1 Exercise 6.12.2 Exercise 6.12.3 Exercise 6.12.4 Exercise 6.13 Solutions 7 Linear regression 7.1 Regression 7.1.1 The Question (1) 7.1.2 Fitting a regression line 7.1.3 When the line fits well 7.1.4 The fitted line and the linear equation 7.1.5 Effect modification 7.1.6 R-squared and model fit 7.1.7 Confounding 7.1.8 Summary 7.2 Fitting simple models 7.2.1 The Question (2) 7.2.2 Get the data 7.2.3 Check the data 7.2.4 Plot the data 7.2.5 Simple linear regression 7.2.6 Multivariable linear regression 7.2.7 Check assumptions 7.3 Fitting more complex models 7.3.1 The Question (3) 7.3.2 Model fitting principles 7.3.3 AIC 7.3.4 Get the data 7.3.5 Check the data 7.3.6 Plot the data 7.3.7 Linear regression with finalfit 7.3.8 Summary 7.4 Exercises 7.4.1 Exercise 7.4.2 Exercise 7.4.3 Exercise 7.4.4 Exercise 7.5 Solutions 8 Working with categorical outcome variables 8.1 Factors 8.2 The Question 8.3 Get the data 8.4 Check the data 8.5 Recode the data 8.6 Should I convert a continuous variable to a categorical variable? 8.6.1 Equal intervals vs quantiles 8.7 Plot the data 8.8 Group factor levels together - fct_collapse() 8.9 Change the order of values within a factor - fct_relevel() 8.10 Summarising factors with finalfit 8.11 Pearson's chi-squared and Fisher's exact tests 8.11.1 Base R 8.12 Fisher's exact test 8.13 Chi-squared / Fisher's exact test using finalfit 8.14 Exercises 8.14.1 Exercise 8.14.2 Exercise 8.14.3 Exercise 9 Logistic regression 9.1 Generalised linear modelling 9.2 Binary logistic regression 9.2.1 The Question (1) 9.2.2 Odds and probabilities 9.2.3 Odds ratios 9.2.4 Fitting a regression line 9.2.5 The fitted line and the logistic regression equation 9.2.6 Effect modification and confounding 9.3 Data preparation and exploratory analysis 9.3.1 The Question (2) 9.3.2 Get the data 9.3.3 Check the data 9.3.4 Recode the data 9.3.5 Plot the data 9.3.6 Tabulate data 9.4 Model assumptions 9.4.1 Linearity of continuous variables to the response 9.4.2 Multicollinearity 9.5 Fitting logistic regression models in base R 9.6 Modelling strategy for binary outcomes 9.7 Fitting logistic regression models with finalfit 9.7.1 Criterion-based model fitting 9.8 Model fitting 9.8.1 Odds ratio plot 9.9 Correlated groups of observations 9.9.1 Simulate data 9.9.2 Plot the data 9.9.3 Mixed effects models in base R 9.10 Exercises 9.10.1 Exercise 9.10.2 Exercise 9.10.3 Exercise 9.10.4 Exercise 9.11 Solutions 10 Time-to-event data and survival 10.1 The Question 10.2 Get and check the data 10.3 Death status 10.4 Time and censoring 10.5 Recode the data 10.6 Kaplan Meier survival estimator 10.6.1 KM analysis for whole cohort 10.6.2 Model 10.6.3 Life table 10.7 Kaplan Meier plot 10.8 Cox proportional hazards regression 10.8.1 coxph() 10.8.2 finalfit() 10.8.3 Reduced model 10.8.4 Testing for proportional hazards 10.8.5 Stratified models 10.8.6 Correlated groups of observations 10.8.7 Hazard ratio plot 10.9 Competing risks regression 10.10 Summary 10.11 Dates in R 10.11.1 Converting dates to survival time 10.12 Exercises 10.12.1 Exercise 10.12.2 Exercise 10.13 Solutions III Workflow 11 The problem of missing data 11.1 Identification of missing data 11.1.1 Missing completely at random (MCAR) 11.1.2 Missing at random (MAR) 11.1.3 Missing not at random (MNAR) 11.2 Ensure your data are coded correctly: ff_glimpse() 11.2.1 The Question 11.3 Identify missing values in each variable: missing_plot() 11.4 Look for patterns of missingness: missing_pattern() 11.5 Including missing data in demographics tables 11.6 Check for associations between missing and observed data 11.6.1 For those who like an omnibus test 11.7 Handling missing data: MCAR 11.7.1 Common solution: row-wise deletion 11.7.2 Other considerations 11.8 Handling missing data: MAR 11.8.1 Common solution: Multivariate Imputation by Chained Equations (mice) 11.9 Handling missing data: MNAR 11.10 Summary 12 Notebooks and Markdown 12.1 What is a Notebook? 12.2 What is Markdown? 12.3 What is the difference between a Notebook and an R Markdown file? 12.4 Notebook vs HTML vs PDF vs Word 12.5 The anatomy of a Notebook / R Markdown file 12.5.1 YAML header 12.5.2 R code chunks 12.5.3 Setting default chunk options 12.5.4 Setting default figure options 12.5.5 Markdown elements 12.6 Interface and outputting 12.6.1 Running code and chunks, knitting 12.7 File structure and workflow 12.7.1 Why go to all this bother? 13 Exporting and reporting 13.1 Which format should I use? 13.2 Working in a .R file 13.3 Demographics table 13.4 Logistic regression table 13.5 Odds ratio plot 13.6 MS Word via knitr/R Markdown 13.6.1 Figure quality in Word output 13.7 Create Word template file 13.8 PDF via knitr/R Markdown 13.9 Working in a .Rmd file 13.10 Moving between formats 13.11 Summary 14 Version control 14.1 Setup Git on RStudio and associate with GitHub 14.2 Create an SSH RSA key and add to your GitHub account 14.3 Create a project in RStudio and commit a file 14.4 Create a new repository on GitHub and link to RStudio project 14.5 Clone an existing GitHub project to new RStudio project 14.6 Summary 15 Encryption 15.1 Safe practice 15.2 encryptr package 15.3 Get the package 15.4 Get the data 15.5 Generate private/public keys 15.6 Encrypt columns of data 15.7 Decrypt specific information only 15.8 Using a lookup table 15.9 Encrypting a file 15.10 Decrypting a file 15.11 Ciphertexts are not matchable 15.12 Providing a public key 15.13 Use cases 15.13.1 Blinding in trials 15.13.2 Re-contacting participants 15.13.3 Long-term follow-up of participants 15.14 Summary Appendix Bibliography Index
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.