Data Science for Infectious Disease Data Analytics: An Introduction with R
- Length: 401 pages
- Edition: 1
- Language: English
- Publisher: Chapman and Hall/CRC
- Publication Date: 2022-12-05
- ISBN-10: 1032187425
- ISBN-13: 9781032187426
- Sales Rank: #8651406 (See Top 100 Books)
Data Science for Infectious Disease Data Analytics: An Introduction with R
provides an overview of modern data science tools and methods that have been developed specifically to analyze infectious disease data. With a quick start guide to epidemiological data visualization and analysis in R, this book spans the gulf between academia and practices providing many lively, instructive data analysis examples using the most up-to-date data, such as the newly discovered coronavirus disease (COVID-19).
The primary emphasis of this book will be the data science procedure in epidemiological studies, including data wrangling, visualization, interpretation, predictive modeling, and inference, which is of immense importance due to increasingly diverse and nonexperimental data across a wide range of fields. The knowledge and skills readers gain from this book are also transferable to other areas, such as public health, business analytics, environmental studies, or spatio-temporal data visualization and analysis in general.
Aimed at readers with an undergraduate knowledge of mathematics and statistics, this book is an ideal introduction to the development and implementation of data science in epidemiology.
Key Features:
Describes the entire data science procedure of how the infectious disease data are collected, curated, visualized, and fed to predictive models, which facilitates effective communication between data sources, scientists, and decision-makers. Describes practical concepts of infectious disease data and provides particular data science perspectives. Overview of the unique features and issues of infectious disease data and how they impact epidemic modeling and projection. Introduces various classes of models and state-of-the-art learning methods to analyze infectious diseases data with valuable insights on how different models and methods could be connected.
Cover Half Title Series Page Title Page Copyright Page Dedication Contents Preface 1. Introduction 1.1. Aims and Scope of This Book 1.2. The Structure of This Book 1.2.1. Infectious Disease Data 1.2.2. Basic Characteristics of the Infection Process 1.2.3. Data Visualization 1.2.4. Epidemic Modeling and Forecasting 2. Data Wrangling 2.1. An Introduction to R Packages “dplyr” and “tidyr” 2.2. Learning R Package “dplyr” 2.2.1. Tibbles 2.2.2. Importing Data 2.2.3. Common “dplyr” Functions 2.3. Selecting Columns and Filtering Rows 2.3.1. Subsetting Variables 2.3.2. Subsetting Observations 2.3.3. Pipes 2.3.4. Selecting Rows with Highest or Lowest Values of a Variable 2.3.5. Additional Features 2.4. Making New Variables with mutate() 2.5. Summarizing Data 2.6. Combining Datasets 2.6.1. The “Join” Family 2.6.2. Toy Examples with Joins 2.6.3. Practicing with Joins for Real Data 2.6.4. More on Combining Rows of Tables 2.7. Data Reshaping 2.7.1. From Wide to Long 2.7.2. From Long to Wide 2.8. Further Reading 2.9. Exercises 3. Data Visualization with R Package “ggplot2” 3.1. An Introduction 3.2. Types of Variables and Preparation 3.2.1. Types of Variables 3.2.2. Rules for Graph Designing 3.2.3. Installing Packages and Loading Data 3.2.4. A Simple Scatterplot 3.3. Position Scales and Axes 3.3.1. Changing the Label of the Axis 3.3.2. Changing the Range of the Axis 3.4. Color Scales and Size of geom_point() 3.4.1. Changing the Color of All Points 3.4.2. Coloring Observations by the Value of a Feature 3.4.3. Changing the Color Palette 3.4.4. Changing the Size by the Value of a Feature 3.4.5. An Example of a Row-labeled Dot Plot 3.5. Individual Geoms 3.5.1. Histograms 3.5.2. Bar Charts 3.5.3. The Default Bar Chart 3.5.4. Bar Charts with Assigned Values 3.5.5. Legends 3.5.6. Boxplots, Jittering and Violin Plots 3.6. Collective Geoms 3.6.1. Smoothers 3.7. Time Series 3.7.1. Basic Line Plots 3.7.2. Adding a Second Line 3.7.3. Adding Ribbons 3.7.4. Adjusting the Scale of the Time Axis 3.7.5. Adding Annotations 3.8. Maps 3.8.1. Making a Base Map 3.8.2. Customizing Choropleth Maps 3.8.3. Overlaying Polygon Maps 3.9. Other Useful Plots 3.9.1. Density and Conditional Density Plots 3.9.2. Adding Marginal Plots 3.10. Arranging Plots 3.10.1. Facets 3.10.2. Combining Plots Using R Package “patchwork” 3.11. Saving the Figure and Output 3.11.1. Saving in Figure Format 3.11.2. Saving in RDS Format 3.12. Further Reading 3.13. Exercises 4. Interactive Visualization 4.1. An Introduction 4.2. Creating Plotly Objects 4.2.1. Using plot_ly() to Create a Plotly Object 4.2.2. Using “dplyr” Verbs to Modify Data 4.2.3. Using ggplotly() to Create a Plotly Object 4.3. Scatterplots and Line Plots 4.3.1. Making a Scatterplot 4.3.2. Markers 4.3.3. A Single Time Series Plot 4.3.4. Hovering Text and Template 4.3.5. Multiple Time Series Plots 4.3.6. More Features About the Lines 4.3.7. Adding Ribbons 4.4. Pie Charts 4.4.1. Draw Static Pie Charts 4.4.2. Drawing Interactive Pie Charts 4.5. Animation 4.5.1. An Animation of the Evolution of Infected vs. Death Count 4.5.2. An Animation of the State-level Time Series Plot of Infected Count 4.6. Saving HTMLs 4.6.1. Saving as Standalone HTML Files 4.6.2. Saving as Non-self-contained HTML Files 4.7. Further Reading 4.8. Exercises 5. R Shiny 5.1. An Introduction to Shiny 5.1.1. Structure of a Shiny Application 5.1.2. Launching a Shiny Application 5.1.3. Creating the First Shiny Application 5.1.4. Creating a New Shiny Application in RStudio 5.1.5. Sharing the Shiny Application 5.2. Useful Input Widgets 5.3. Displaying Reactive Outputs 5.4. Rendering Plotly inside Shiny 5.5. Further Reading 6. Interactive Geospatial Visualization 6.1. An Introduction to Leaflet 6.1.1. Features and Installation 6.1.2. Basic Usage 6.2. The Data Object 6.2.1. Specifying Latitude/Longitude in Base R 6.2.2. Using R Package “sp” 6.2.3. Using R Package “maps” 6.3. Choropleth Maps 6.3.1. Creating a Base Map 6.3.2. Coloring the Map 6.3.3. Interactive Maps 6.4. Legends 6.4.1. Classification Schemes 6.4.2. Mapping Variables to Colors 6.5. Examples of County-level Maps 6.5.1. A County-level Map of COVID-19 Infection Risk 6.5.2. A County-level Map of COVID-19 Control Policy 6.6. Spot Maps 6.6.1. Adding Circles 6.6.2. Adding Popups 6.6.3. Adding Labels 6.7. Integrating Leaflet with R Shiny 6.8. Further Reading 6.9. Exercises 7. Epidemic Modeling 7.1. An Introduction to Epidemic Modeling 7.2. Mechanistic Models 7.2.1. Compartment Modeling 7.2.2. Agent-based Methods 7.3. Phenomenological Models 7.3.1. Time Series Analysis 7.3.2. Regression Methods 7.3.3. Machine Learning Methods 7.4. Hybrid Models and Ensemble Methods 7.5. Epidemic Modeling: Mathematical and Statistical Perspectives 7.6. Some Terms in Epidemic Modeling 7.7. Further Reading 8. Compartment Models 8.1. SIS Models 8.2. SIR Models 8.3. SIR Models with Births and Deaths 8.4. SEIR Models 8.5. Parameter Estimation for Compartment Models 8.5.1. Least-squares Method 8.5.2. Maximum Likelihood Method 8.6. Implementation of Parameter Estimation in R 8.6.1. An Application to Influenza-like Illness Data 8.6.2. An Application to COVID-19 Data 8.7. Basic and Effective Reproduction Number 8.7.1. Basic Reproduction Number 8.7.2. Effective Reproduction Number 8.7.3. Herd Immunity 8.8. Further Reading 8.9. Exercises 9. Time Series Analysis of Infectious Disease Data 9.1. Datasets and R Packages 9.1.1. Data 9.1.2. R Package “fable” 9.2. An Introduction to Time Series Analysis 9.2.1. Tsibble Objects 9.2.2. Working with Tsibble Objects 9.2.3. Drawing Time Series Plots 9.2.4. Objectives of Time Series Analysis 9.2.5. Stationarity 9.2.6. Autocovariance and Autocorrelation 9.3. Time Series Decomposition 9.3.1. Box-Cox Transformations 9.3.2. Methods for Estimating the Trend 9.3.3. Seasonal Component 9.3.4. Trend and Seasonality Estimation 9.4. Simple Time Series Forecasting Approaches 9.4.1. Average Method 9.4.2. Random Walk Forecasts 9.4.3. Seasonal Random Walk Forecasts 9.4.4. Random Walk with Drift Method 9.4.5. Displaying All Forecasting Results 9.4.6. Distributional Forecasts and Prediction Intervals 9.5. Residual Diagnostics and Accuracy Evaluation 9.5.1. Residual Diagnostics 9.5.2. Forecasting Accuracy 9.5.3. Selection of the Time Series 9.6. ARIMA Models 9.6.1. Differencing 9.6.2. ARMA Models 9.6.3. ARIMA Models 9.6.4. Seasonal ARIMA (SARIMA) Model 9.6.5. Building SARIMA Models 9.7. Model Comparison 9.7.1. Exponential Smoothing and ARIMA Models 9.7.2. Cross-validation for Time Series Analysis 9.8. Ensuring Forecasts Stay within Limits 9.8.1. Positive Forecasts 9.8.2. Forecasts Constrained to an Interval 9.9. Prediction and Prediction Intervals for Aggregates 9.10. Outliers and Anomalies 9.10.1. Empirical Rule 9.10.2. Boxplots 9.10.3. Outliers in Time Series 9.10.4. Tidy Anomaly Detection for Time Series with “anomalize” 9.10.5. A Discussion on Outlier and Anomalies Repair 9.11. Further Reading 9.12. Exercises 10. Regression Methods 10.1. Parametric Regression Methods 10.1.1. Linear Regression and Nonlinear Regression 10.1.2. Model Adequacy Checking 10.2. Nonparametric Regression Methods 10.2.1. Piecewise Constant Splines 10.2.2. Truncated Power Splines 10.2.3. B-splines and Natural Splines 10.2.4. Smoothing Splines 10.3. An Application to CDC FluView Portal Data 10.3.1. Trigonometric Regression 10.3.2. Smoothing Splines 10.4. Poisson Regression 10.4.1. Poisson Regression 10.4.2. Zero-inflated Poisson Regression 10.4.3. Count Time Series Analysis 10.5. Logistic Regression 10.5.1. Odds and Odds Ratios 10.5.2. Estimating Logistic Regression Coefficients 10.5.3. Logistic Regression with Multiple Explanatory Variables 10.6. Further Reading 10.7. Exercises 11. Neural Networks 11.1. A Single Neuron 11.2. Neural Network Structure 11.3. Neural Network Training 11.3.1. Forward Propagation 11.3.2. Backpropagation 11.4. Overfitting 11.5. Neural Network Auto-Regressive (NNAR) Models 11.6. COVID-19 Forecasting Using NNAR 11.7. Further Reading 11.8. Exercises 12. Hybrid Models 12.1. Ensembling Time Series Models 12.2. R Package “forecastHybrid” 12.2.1. Installation 12.2.2. An Introduction 12.2.3. Model Diagnostics 12.2.4. Forecasting 12.2.5. Performing Cross-Validation on a Time Series 12.2.6. Weights Selection Using Cross-Validation 12.3. R Package “opera” 12.3.1. Installation 12.3.2. An Introduction 12.4. Further Reading 12.5. Exercises A. Appendix A A.1. R Introduction and Preliminaries A.1.1. The R Environment and Language A.1.2. Obtaining R, RStudio and Installation A.2. Starting RStudio A.2.1. Source Pane A.2.2. Console Pane A.2.3. Error Messages A.2.4. R Help A.2.5. R Packages A.2.6. Creating a Project and Setting a Working Directory A.3. Exporting and Importing Data A.3.1. Data Export A.3.2. Data Import A.3.3. The read.csv() Function A.3.4. The “readr” Package A.3.5. Importing an Excel File into R A.3.6. Accessing Built-in Datasets A.4. Control Structures in R A.4.1. Grouped Expressions and Control Structures A.4.2. Iterations B. Appendix B B.1. COVID-19 Data and Factors Integrated from Multiple Sources B.1.1. Epidemic Data B.1.2. Other Factors B.1.3. Datasets B.2. CDC FluView Portal Data C. Appendix C C.1. Classes: R Dates and Times C.2. Formatting Date and Date/Time Variables C.3. Creating Data/Time Objects in R C.4. Parsing Date and Time C.4.1. Date-time Conversion to and from Character Using Base R Functions C.4.2. Parsing Date and Time Using “lubridate” C.5. Setting and Extracting Information C.5.1. Epidemiological Calendar C.6. Merging Separate Date Information C.7. Date Calculations in R Bibliography Index
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.