R for Health Data Science

Length: 344 pages
Edition: 1
Language: English
Publisher: Chapman and Hall/CRC
Publication Date: 2020-11-17
ISBN-10: 0367428199
ISBN-13: 9780367428198
Sales Rank: #482959 (See Top 100 Books)

In this age of information, the manipulation, analysis, and interpretation of data have become a fundamental part of professional life; nowhere more so than in the delivery of healthcare. From the understanding of disease and the development of new treatments, to the diagnosis and management of individual patients, the use of data and technology is now an integral part of the business of healthcare. Those working in healthcare interact daily with data, often without realising it. The conversion of this avalanche of information to useful knowledge is essential for high-quality patient care.

R for Health Data Science includes everything a healthcare professional needs to go from R novice to R guru. By the end of this book, you will be taking a sophisticated approach to health data science with beautiful visualisations, elegant tables, and nuanced analyses.

Features

Provides an introduction to the fundamentals of R for healthcare professionals
Highlights the most popular statistical approaches to health data science
Written to be as accessible as possible with minimal mathematics
Emphasises the importance of truly understanding the underlying data through the use of plots
Includes numerous examples that can be adapted for your own data
Helps you create publishable documents and collaborate across teams

With this book, you are in safe hands – Prof. Harrison is a clinician and Dr. Pius is a data scientist, bringing 25 years’ combined experience of using R at the coal face. This content has been taught to hundreds of individuals from a variety of backgrounds, from rank beginners to experts moving to R from other platforms.

Cover
Half Title
Title Page
Copyright Page
Dedication
Preface
About the Authors
I Data wrangling and visualisation
    1 Why we love R
        1.1 Help, what's a script?
        1.2 What is RStudio?
        1.3 Getting started
        1.4 Getting help
        1.5 Work in a Project
        1.6 Restart R regularly
        1.7 Notation throughout this book
    2 R basics
        2.1 Reading data into R
            2.1.1 Import Dataset interface
            2.1.2 Reading in the Global Burden of Disease example dataset
        2.2 Variable types and why we care
            2.2.1 Numeric variables (continuous)
            2.2.2 Character variables
            2.2.3 Factor variables (categorical)
            2.2.4 Date/time variables
        2.3 Objects and functions
            2.3.1 data frame/tibble
            2.3.2 Naming objects
            2.3.3 Function and its arguments
            2.3.4 Working with objects
            2.3.5 <- and =
            2.3.6 Recap: object, function, input, argument
        2.4 Pipe - %>%
            2.4.1 Using. to direct the pipe
        2.5 Operators for filtering data
            2.5.1 Worked examples
        2.6 The combine function: c()
        2.7 Missing values (NAs) and filters
        2.8 Creating new columns - mutate()
            2.8.1 Worked example/exercise
        2.9 Conditional calculations - if_else()
        2.10  Create labels - paste()
        2.11  Joining multiple datasets
            2.11.1 Further notes about joins
    3 Summarising data
        3.1 Get the data
        3.2 Plot the data
        3.3 Aggregating: group_by(), summarise()
        3.4 Add new columns: mutate()
            3.4.1 Percentages formatting: percent()
        3.5 summarise() vs mutate()
        3.6 Common arithmetic functions - sum(), mean(), median(), etc.
        3.7 select() columns
        3.8 Reshaping data - long vs wide format
            3.8.1 Pivot values from rows into columns (wider)
            3.8.2 Pivot values from columns to rows (longer)
            3.8.3 separate() a column into multiple columns
        3.9 arrange() rows
            3.9.1 Factor levels
        3.10 Exercises
            3.10.1 Exercise - pivot_wider()
            3.10.2 Exercise - group_by(), summarise()
            3.10.3 Exercise - full_join(), percent()
            3.10.4 Exercise - mutate(), summarise()
            3.10.5 Exercise - filter(), summarise(), pivot_wider()
    4 Different types of plots
        4.1 Get the data
        4.2 Anatomy of ggplot explained
        4.3 Set your theme - grey vs white
        4.4 Scatter plots/bubble plots
        4.5 Line plots/time series plots
            4.5.1 Exercise
        4.6 Bar plots
            4.6.1 Summarised data
            4.6.2 Countable data
            4.6.3 colour vs fill
            4.6.4 Proportions
            4.6.5 Exercise
        4.7 Histograms
        4.8 Box plots
        4.9 Multiple geoms, multiple aes()
            4.9.1 Worked example - three geoms together
        4.10  All other types of plots
        4.11  Solutions
        4.12  Extra: Advanced examples
    5 Fine tuning plots
        5.1 Get the data
        5.2 Scales
            5.2.1 Logarithmic
            5.2.2 Expand limits
            5.2.3 Zoom in
            5.2.4 Exercise
            5.2.5 Axis ticks
        5.3 Colours
            5.3.1 Using the Brewer palettes:
            5.3.2 Legend title
            5.3.3 Choosing colours manually
        5.4 Titles and labels
            5.4.1 Annotation
            5.4.2 Annotation with a superscript and a variable
        5.5 Overall look - theme()
            5.5.1 Text size
            5.5.2 Legend position
        5.6 Saving your plot
II Data analysis
    6 Working with continuous outcome variables
        6.1 Continuous data
        6.2 The Question
        6.3 Get and check the data
        6.4 Plot the data
            6.4.1 Histogram
            6.4.2 Quantile-quantile (Q-Q) plot
            6.4.3 Boxplot
        6.5 Compare the means of two groups
            6.5.1 t-test
            6.5.2 Two-sample t-tests
            6.5.3 Paired t-tests
            6.5.4 What if I run the wrong test?
        6.6 Compare the mean of one group: one sample t-tests
            6.6.1 Interchangeability of t-tests
        6.7 Compare the means of more than two groups
            6.7.1 Plot the data
            6.7.2 ANOVA
            6.7.3 Assumptions
        6.8 Multiple testing
            6.8.1 Pairwise testing and multiple comparisons
        6.9 Non-parametric tests
            6.9.1 Transforming data
            6.9.2 Non-parametric test for comparing two groups
            6.9.3 Non-parametric test for comparing more than two groups
        6.10  Finalfit approach
        6.11  Conclusions
        6.12  Exercises
            6.12.1 Exercise
            6.12.2 Exercise
            6.12.3 Exercise
            6.12.4 Exercise
        6.13  Solutions
    7 Linear regression
        7.1 Regression
            7.1.1 The Question (1)
            7.1.2 Fitting a regression line
            7.1.3 When the line fits well
            7.1.4 The fitted line and the linear equation
            7.1.5 Effect modification
            7.1.6 R-squared and model fit
            7.1.7 Confounding
            7.1.8 Summary
        7.2 Fitting simple models
            7.2.1 The Question (2)
            7.2.2 Get the data
            7.2.3 Check the data
            7.2.4 Plot the data
            7.2.5 Simple linear regression
            7.2.6 Multivariable linear regression
            7.2.7 Check assumptions
        7.3 Fitting more complex models
            7.3.1 The Question (3)
            7.3.2 Model fitting principles
            7.3.3 AIC
            7.3.4 Get the data
            7.3.5 Check the data
            7.3.6 Plot the data
            7.3.7 Linear regression with finalfit
            7.3.8 Summary
        7.4 Exercises
            7.4.1 Exercise
            7.4.2 Exercise
            7.4.3 Exercise
            7.4.4 Exercise
            7.5 Solutions
    8 Working with categorical outcome variables
        8.1 Factors
        8.2 The Question
        8.3 Get the data
        8.4 Check the data
        8.5 Recode the data
        8.6 Should I convert a continuous variable to a categorical variable?
            8.6.1 Equal intervals vs quantiles
        8.7 Plot the data
        8.8 Group factor levels together - fct_collapse()
        8.9 Change the order of values within a factor - fct_relevel()
        8.10  Summarising factors with finalfit
        8.11  Pearson's chi-squared and Fisher's exact tests
            8.11.1 Base R
        8.12  Fisher's exact test
        8.13  Chi-squared / Fisher's exact test using finalfit
        8.14  Exercises
            8.14.1 Exercise
            8.14.2 Exercise
            8.14.3 Exercise
    9 Logistic regression
        9.1 Generalised linear modelling
        9.2 Binary logistic regression
            9.2.1 The Question (1)
            9.2.2 Odds and probabilities
            9.2.3 Odds ratios
            9.2.4 Fitting a regression line
            9.2.5 The fitted line and the logistic regression equation
            9.2.6 Effect modification and confounding
        9.3 Data preparation and exploratory analysis
            9.3.1 The Question (2)
            9.3.2 Get the data
            9.3.3 Check the data
            9.3.4 Recode the data
            9.3.5 Plot the data
            9.3.6 Tabulate data
        9.4 Model assumptions
            9.4.1 Linearity of continuous variables to the response
            9.4.2 Multicollinearity
        9.5 Fitting logistic regression models in base R
        9.6 Modelling strategy for binary outcomes
        9.7 Fitting logistic regression models with finalfit
            9.7.1 Criterion-based model fitting
        9.8 Model fitting
            9.8.1 Odds ratio plot
        9.9 Correlated groups of observations
            9.9.1 Simulate data
            9.9.2 Plot the data
            9.9.3 Mixed effects models in base R
        9.10  Exercises
            9.10.1 Exercise
            9.10.2 Exercise
            9.10.3 Exercise
            9.10.4 Exercise
        9.11  Solutions
    10 Time-to-event data and survival
        10.1 The Question
        10.2 Get and check the data
        10.3 Death status
        10.4 Time and censoring
        10.5 Recode the data
        10.6 Kaplan Meier survival estimator
            10.6.1 KM analysis for whole cohort
            10.6.2 Model
            10.6.3 Life table
        10.7 Kaplan Meier plot
        10.8 Cox proportional hazards regression
            10.8.1 coxph()
            10.8.2 finalfit()
            10.8.3 Reduced model
            10.8.4 Testing for proportional hazards
            10.8.5 Stratified models
            10.8.6 Correlated groups of observations
            10.8.7 Hazard ratio plot
        10.9 Competing risks regression
        10.10 Summary
        10.11 Dates in R
            10.11.1 Converting dates to survival time
        10.12 Exercises
            10.12.1 Exercise
            10.12.2 Exercise
        10.13 Solutions
III Workflow
    11 The problem of missing data
        11.1 Identification of missing data
            11.1.1 Missing completely at random (MCAR)
            11.1.2 Missing at random (MAR)
            11.1.3 Missing not at random (MNAR)
        11.2 Ensure your data are coded correctly: ff_glimpse()
            11.2.1 The Question
        11.3 Identify missing values in each variable: missing_plot()
        11.4 Look for patterns of missingness: missing_pattern()
        11.5 Including missing data in demographics tables
        11.6 Check for associations between missing and observed data
            11.6.1 For those who like an omnibus test
        11.7 Handling missing data: MCAR
            11.7.1 Common solution: row-wise deletion
            11.7.2 Other considerations
        11.8 Handling missing data: MAR
            11.8.1 Common solution: Multivariate Imputation by Chained Equations (mice)
        11.9 Handling missing data: MNAR
        11.10 Summary
    12 Notebooks and Markdown
        12.1 What is a Notebook?
        12.2 What is Markdown?
        12.3 What is the difference between a Notebook and an R Markdown file?
        12.4 Notebook vs HTML vs PDF vs Word
        12.5 The anatomy of a Notebook / R Markdown file
            12.5.1 YAML header
            12.5.2 R code chunks
            12.5.3 Setting default chunk options
            12.5.4 Setting default figure options
            12.5.5 Markdown elements
        12.6 Interface and outputting
            12.6.1 Running code and chunks, knitting
        12.7 File structure and workflow
            12.7.1 Why go to all this bother?
    13 Exporting and reporting
        13.1 Which format should I use?
        13.2 Working in a .R file
        13.3 Demographics table
        13.4 Logistic regression table
        13.5 Odds ratio plot
        13.6 MS Word via knitr/R Markdown
            13.6.1 Figure quality in Word output
        13.7 Create Word template file
        13.8 PDF via knitr/R Markdown
        13.9 Working in a .Rmd file
        13.10 Moving between formats
        13.11 Summary
    14 Version control
        14.1 Setup Git on RStudio and associate with GitHub
        14.2 Create an SSH RSA key and add to your GitHub account
        14.3 Create a project in RStudio and commit a file
        14.4 Create a new repository on GitHub and link to RStudio project
        14.5 Clone an existing GitHub project to new RStudio project
        14.6 Summary
    15 Encryption
        15.1 Safe practice
        15.2 encryptr package
        15.3 Get the package
        15.4 Get the data
        15.5 Generate private/public keys
        15.6 Encrypt columns of data
        15.7 Decrypt specific information only
        15.8 Using a lookup table
        15.9 Encrypting a file
        15.10 Decrypting a file
        15.11 Ciphertexts are not matchable
        15.12 Providing a public key
        15.13 Use cases
            15.13.1 Blinding in trials
            15.13.2 Re-contacting participants
            15.13.3 Long-term follow-up of participants
        15.14 Summary
Appendix
Bibliography
Index