Machine Learning for Knowledge Discovery with R: Methodologies for Modeling, Inference and Prediction

Length: 260 pages
Edition: 1
Language: English
Publisher: Chapman and Hall/CRC
Publication Date: 2021-09-15
ISBN-10: 1032065362
ISBN-13: 9781032065366
Sales Rank: #0 (See Top 100 Books)

Machine Learning for Knowledge Discovery with R contains methodologies and examples for statistical modelling, inference, and prediction of data analysis. It includes many recent supervised and unsupervised machine learning methodologies such as recursive partitioning modelling, regularized regression, support vector machine, neural network, clustering, and causal-effect inference. Additionally, it emphasizes statistical thinking of data analysis, use of statistical graphs for data structure exploration, and result presentations. The book includes many real-world data examples from life-science, finance, etc. to illustrate the applications of the methods described therein.

Key Features:

Contains statistical theory for the most recent supervised and unsupervised machine learning methodologies.
Emphasizes broad statistical thinking, judgment, graphical methods, and collaboration with subject-matter-experts in analysis, interpretation, and presentations.
Written by statistical data analysis practitioner for practitioners.

The book is suitable for upper-level-undergraduate or graduate-level data analysis course. It also serves as a useful desk-reference for data analysts in scientific research or industrial applications.

Cover
Half Title
Title Page
Copyright Page
Dedication
Contents
Preface
1. Statistical Data Analysis
1.1. Perspectives of Data Analysis
1.2. Strategies and Stages of Data Analysis
1.3. Data Quality
1.3.1. Heterogeneity in Data Sources
1.3.1.1. Heterogeneity in Study Subject Populations
1.3.1.2. Heterogeneity in Data due to Timing of Generations
1.3.2. Noise Accumulation
1.3.3. Spurious Correlation
1.3.4. Missing Data
1.4. Data Sets Analyzed in This Book
1.4.1. NCI-60
1.4.2. Riboflavin Production with Bacillus Subtilis
1.4.3. TCGA
1.4.4. The Boston Housing Data Set
2. Examining Data Distribution
2.1. One Dimension
2.1.1. Histogram, Stem-and-Leaf, Density Plot
2.1.2. Box Plot
2.1.3. Quantile-Quantile (Q-Q) Plot , Normal Plot, Probability-Probability (P-P) Plot
2.2. Two Dimension
2.2.1. Scatter Plot
2.2.2. Ellipse - Visualization of Covariance and Correlation
2.2.3. Multivariate Normality Test
2.3. More Than Two Dimension
2.3.1. Scatter Plot Matrix
2.3.2 Andrews's Plot
2.3.3. Conditional Plot
2.4. Visualization of Categorical Data
2.4.1. Mosaic Plot
2.4.2. Association Plot
3. Regression with Shrinkage
3.1. Ridge Regression
3.2. Lasso
3.2.1. Example: Lasso on Continuous Data
3.2.2. Example: Lasso on Binary Data
3.2.3. Example: Lasso on Survival Data
3.3. Group Lasso
3.3.1. Example: Group Lasso on Gene Signatures
3.4. Sparse Group Lasso
3.4.1. Example: Lasso, Group Lasso, Sparse Group Lasso on Simulated Continuous Data
3.4.2. Example: Lasso, Group Lasso, Sparse Group Lasso on Gene Signatures Continuous Data
3.5. Adaptive Lasso
3.5.1. Example: Adaptive Lasso on Continuous Data
3.5.2. Example: Adaptive Lasso on Binary Data
3.6. Elastic Net
3.6.1. Example: Elastic Net on Continuous Data
3.6.2. Example: Elastic Net on Binary Data
3.7. The Sure Screening Method
3.7.1. The Sure Screening Method
3.7.2. Sure Independence Screening on Model Selection
3.7.3. Example: SIS on Continuous Data
3.7.4. Example: SIS on Survival Data
3.8. Identify Minimal Class of Models
3.8.1. Analysis Using Minimal Models
4. Recursive Partitioning Modeling
4.1. Recursive Partitioning Modeling via Trees
4.1.1. Elements of Growing a Tree
4.1.1.1. Grow a Tree
4.1.2.1. Definition of Impurity Function
4.1.2.2. Measure of Node Impurity - the Gini Index
4.1.2. The Impurity Function
4.1.3. Misclassification Cost
4.1.4. Size of Trees
4.1.5. Example of Recursive Partitioning
4.1.5.1. Recursive Partitioning with Binary Outcomes
4.1.5.2. Recursive Partitioning with Continuous Outcomes
4.1.5.3. Recursive Partitioning for Survival Outcomes
4.2. Random Forest
4.2.1. Mechanism of Action of Random Forests
4.2.2. Variable Importance
4.2.3. Random Forests for Regression
4.2.4. Example of Random Forest Data Analysis
4.2.4.1. randomForest for Binary Data
4.2.4.2. randomForest for Continuous Data
4.3. Random Survival Forest
4.3.1. Algorithm to Construct RSF
4.3.2. Individual and Ensemble Estimate at Terminal Nodes
4.3.3. VIMP
4.3.4. Example
4.4. XGBoost: A Tree Boosting System
4.4.1. Example Using xgboost for Data Analysis
4.4.1.1. xgboost for Binary Data
4.4.1.2. xgboost for Continuous Data
4.4.2. Example - xgboost for Cox Regression
4.5. Model-based Recursive Partitioning
4.5.1. The Recursive Partitioning Algorithm
4.5.2. Example
4.6. Recursive Partition for Longitudinal Data
4.6.1. Methodology
4.6.2. Recursive Partition for Longitudinal Data Based on Baseline Covariates
4.6.2.1. Methodology
4.6.3. LongCART Algorithm
4.6.4. Example of Recursive Partitioning of Longitudinal Data
4.7. Analysis of Ordinal Data
4.8. Examples - Analysis of Ordinal Data
4.8.1. Analysis of Cleveland Clinic Heart Data (Ordinal)
4.8.2. Analysis of Cleveland Clinic Heart Data (Twoing)
4.9. Advantages and Disadvantages of Trees
5. Support Vector Machine
5.1. General Theory of Classification and Regression in Hyperplane
5.1.1. Separable Case
5.1.2. Non-separable Case
5.1.2.1. Method of Stochastic Approximation
5.1.2.2. Method of Sigmoid Approximations
5.1.2.3. Method of Radial Basis Functions
5.2. SVM for Indicator Functions
5.2.1. Optimal Hyperplane for Separable Data Sets
5.2.1.1. Constructing the Optimal Hyperplane
5.2.2.1. Generalization of the Optimal Hyperplane
5.2.2. Optimal Hyperplane for Non-Separable Sets
5.2.3. Support Vector Machine
5.2.4. Constructing SVM
5.2.4.1. Polynomial Kernel Functions
5.2.4.2. Radial Basis Kernel Functions
5.2.5. Example: Analysis of Binary Classification Using SVM
5.2.6. Example: Effect of Kernel Selection
5.3. SVM for Continuous Data
5.3.1. Minimizing the Risk with e-insensitive Loss Functions
5.3.2. Example: Regression Analysis Using SVM
5.4. SVM for Survival Data Analysis
5.4.1. Example: Analysis of Survival Data Using SVM
5.5. Feature Elimination for SVM
5.5.1. Example: Gene Selection via SVM with Feature Elimination
5.6. Spare Bayesian Learning with Relevance Vector Machine (RVM)
5.6.1. Example: Regression Analysis Using RVM
5.6.2. Example: Curve Fitting for SVM and RVM
5.7. SV Machines for Function Estimation
6. Cluster Analysis
6.1. Measure of Distance/Dissimilarity
6.1.1. Continuous Variables
6.1.2. Binary and Categorical Variables
6.1.3. Mixed Data Types
6.1.4. Other Measure of Dissimilarity
6.2. Hierarchical Clustering
6.2.1. Options of Linkage
6.2.2. Example of Hierarchical Clustering
6.3. K-means Cluster
6.3.1. General Description of K-means Clustering
6.3.2. Estimating the Number of Clusters
6.4. The PAM Clustering Algorithm
6.4.1. Example of K-means with PAM Clustering Algorithm
6.5. Bagged Clustering
6.5.1. Example of Bagged Clustering
6.6. RandomForest for Clustering
6.6.1. Example: Random Forest for Clustering
6.7. Mixture Models/Model-based Cluster Analysis
6.8. Stability of Clusters
6.9. Consensus Clustering
6.9.1. Determination of Clusters
6.9.2. Example of Consensus Clustering on RNA Sequence Data
6.10. The Integrative Clustering Framework
6.10.1. Example: Integrative Clustering
7. Neural Network
7.1. General Theory of Neural Network
7.2. Elemental Aspects and Structure of Artificial Neural Networks
7.3. Multilayer Perceptrons
7.3.1. The Simple (Single Unit) Perceptron
7.3.2. Training Perceptron Learning
7.4. Multilayer Perceptrons (MLP)
7.4.1. Architectures of MLP
7.4.2. Training MLP
7.5. Deep Learning
7.5.1. Model Parameterization
7.6. Few Pros and Cons of Neural Networks
7.7. Examples
8. Causal Inference and Matching
8.1. Introduction
8.2. Three Layer Causal Hierarchy
8.3. Seven Tools of Causal Inference
8.4. Statistical Framework of Causal Inferences
8.5. Propensity Score
8.6. Methodologies of Matching
8.6.1. Nearest Neighbor (or greedy) Matching
8.6.1.1. Example Using Nearest Neighbor Matching
8.6.2. Exact Matching
8.6.2.1. Example
8.6.3. Mahalanobis Distance Matching
8.6.3.1. Example
8.6.4. Genetic Matching
8.6.4.1. Example
8.7. Optimal Matching
8.7.0.1. Example
8.8. Full Matching
8.8.0.1. Example
8.8.1. Analysis of Data After Matching
8.8.1.1. Example
8.9. Cluster Matching
8.9.1. Example
9. Business and Commercial Data Modeling
9.1. Case Study One: Marketing Campaigns of a Portuguese Banking Institution
9.1.1. Description of Data
9.1.2. Data Analysis
9.1.2.1. Analysis via Lasso
9.1.2.2. Analysis via Elastic Net
9.1.2.3. Analysis via SIS
9.1.2.4. Analysis via rpart
9.1.2.5. Analysis via randomForest
9.1.2.6. Analysis via xgboost
9.2. Summary
9.3. Case Study Two: Polish Companies Bankruptcy Data
9.3.1. Description of Data
9.3.2. Data Analysis
9.3.2.1. Analysis of Year-1 Data (univariate analysis)
9.3.2.2. Analysis of Year-3 Data (univariate analysis)
9.3.2.3. Analysis of Year-5 Data (univariate analysis)
9.3.2.4. Analysis of Year-1 Data (composite analysis)
9.3.2.5. Analysis of Year-3 Data (composite analysis)
9.3.2.6. Analysis of Year-5 Data (composite analysis)
9.4. Summary
10. Analysis of Response Profiles
10.1. Introduction
10.2. Data Example
10.3. Transition of Response States
10.4. Classification of Response Profiles
10.4.1. Dissimilarities Between Response Profiles
10.4.2. Visualizing Clusters via Multidimensional Scaling
10.4.3. Response Profile Differences among Clusters
10.4.4. Significant Clinical Variables for Each Cluster
10.5. Modeling of Response Profiles via GEE
10.5.1. Marginal Models
10.5.2. Estimation of Marginal Regression Parameters
10.5.3. Local Odds Ratio
10.5.4. Results of Modeling
10.6. Summary
Bibliography
Index