Introduction to Machine Learning, 4th edition
- Length: 712 pages
- Edition: 4
- Language: English
- Publisher: The MIT Press
- Publication Date: 2020-03-24
- ISBN-10: 0262043793
- ISBN-13: 9780262043793
- Sales Rank: #226533 (See Top 100 Books)
A substantially revised fourth edition of a comprehensive textbook, including new coverage of recent advances in deep learning and neural networks.
The goal of machine learning is to program computers to use example data or past experience to solve a given problem. Machine learning underlies such exciting new technologies as self-driving cars, speech recognition, and translation applications. This substantially revised fourth edition of a comprehensive, widely used machine learning textbook offers new coverage of recent advances in the field in both theory and practice, including developments in deep learning and neural networks.
The book covers a broad array of topics not usually included in introductory machine learning texts, including supervised learning, Bayesian decision theory, parametric methods, semiparametric methods, nonparametric methods, multivariate analysis, hidden Markov models, reinforcement learning, kernel machines, graphical models, Bayesian estimation, and statistical testing. The fourth edition offers a new chapter on deep learning that discusses training, regularizing, and structuring deep neural networks such as convolutional and generative adversarial networks; new material in the chapter on reinforcement learning that covers the use of deep networks, the policy gradient methods, and deep reinforcement learning; new material in the chapter on multilayer perceptrons on autoencoders and the word2vec network; and discussion of a popular method of dimensionality reduction, t-SNE. New appendixes offer background material on linear algebra and optimization. End-of-chapter exercises help readers to apply concepts learned. Introduction to Machine Learning can be used in courses for advanced undergraduate and graduate students and as a reference for professionals.
Cover Copyright Contents Preface Notations 1 Introduction 1.1 What Is Machine Learning? 1.2 Examples of Machine Learning Applications 1.2.1 Association Rules 1.2.2 Classification 1.2.3 Regression 1.2.4 Unsupervised Learning 1.2.5 Reinforcement Learning 1.3 History 1.4 Related Topics 1.4.1 High-Performance Computing 1.4.2 Data Privacy and Security 1.4.3 Model Interpretability and Trust 1.4.4 Data Science 1.5 Exercises 1.6 References 2 Supervised Learning 2.1 Learning a Class from Examples 2.2 Vapnik-Chervonenkis Dimension 2.3 Probably Approximately Correct Learning 2.4 Noise 2.5 Learning Multiple Classes 2.6 Regression 2.7 Model Selection and Generalization 2.8 Dimensions of a Supervised Machine Learning Algorithm 2.9 Notes 2.10 Exercises 2.11 References 3 Bayesian Decision Theory 3.1 Introduction 3.2 Classification 3.3 Losses and Risks 3.4 Discriminant Functions 3.5 Association Rules 3.6 Notes 3.7 Exercises 3.8 References 4 Parametric Methods 4.1 Introduction 4.2 Maximum Likelihood Estimation 4.2.1 Bernoulli Density 4.2.2 Multinomial Density 4.2.3 Gaussian (Normal) Density 4.3 Evaluating an Estimator: Bias and Variance 4.4 The Bayes’ Estimator 4.5 Parametric Classification 4.6 Regression 4.7 Tuning Model Complexity: Bias/Variance Dilemma 4.8 Model Selection Procedures 4.9 Notes 4.10 Exercises 4.11 References 5 Multivariate Methods 5.1 Multivariate Data 5.2 Parameter Estimation 5.3 Estimation of Missing Values 5.4 Multivariate Normal Distribution 5.5 Multivariate Classification 5.6 Tuning Complexity 5.7 Discrete Features 5.8 Multivariate Regression 5.9 Notes 5.10 Exercises 5.11 References 6 Dimensionality Reduction 6.1 Introduction 6.2 Subset Selection 6.3 Principal Component Analysis 6.4 Feature Embedding 6.5 Factor Analysis 6.6 Singular Value Decomposition and Matrix Factorization 6.7 Multidimensional Scaling 6.8 Linear Discriminant Analysis 6.9 Canonical Correlation Analysis 6.10 Isomap 6.11 Locally Linear Embedding 6.12 Laplacian Eigenmaps 6.13 t-Distributed Stochastic Neighbor Embedding 6.14 Notes 6.15 Exercises 6.16 References 7 Clustering 7.1 Introduction 7.2 Mixture Densities 7.3 k-Means Clustering 7.4 Expectation-Maximization Algorithm 7.5 Mixtures of Latent Variable Models 7.6 Supervised Learning after Clustering 7.7 Spectral Clustering 7.8 Hierarchical Clustering 7.9 Choosing the Number of Clusters 7.10 Notes 7.11 Exercises 7.12 References 8 Nonparametric Methods 8.1 Introduction 8.2 Nonparametric Density Estimation 8.2.1 Histogram Estimator 8.2.2 Kernel Estimator 8.2.3 k-Nearest Neighbor Estimator 8.3 Generalization to Multivariate Data 8.4 Nonparametric Classification 8.5 Condensed Nearest Neighbor 8.6 Distance-Based Classification 8.7 Outlier Detection 8.8 Nonparametric Regression: Smoothing Models 8.8.1 Running Mean Smoother 8.8.2 Kernel Smoother 8.8.3 Running Line Smoother 8.9 How to Choose the Smoothing Parameter 8.10 Notes 8.11 Exercises 8.12 References 9 Decision Trees 9.1 Introduction 9.2 Univariate Trees 9.2.1 Classification Trees 9.2.2 Regression Trees 9.3 Pruning 9.4 Rule Extraction from Trees 9.5 Learning Rules from Data 9.6 Multivariate Trees 9.7 Notes 9.8 Exercises 9.9 References 10 Linear Discrimination 10.1 Introduction 10.2 Generalizing the Linear Model 10.3 Geometry of the Linear Discriminant 10.3.1 Two Classes 10.3.2 Multiple Classes 10.4 Pairwise Separation 10.5 Parametric Discrimination Revisited 10.6 Gradient Descent 10.7 Logistic Discrimination 10.7.1 Two Classes 10.7.2 Multiple Classes 10.7.3 Multiple Labels 10.8 Learning to Rank 10.9 Notes 10.10 Exercises 10.11 References 11 Multilayer Perceptrons 11.1 Introduction 11.1.1 Understanding the Brain 11.1.2 Neural Networks as a Paradigm for Parallel Processing 11.2 The Perceptron 11.3 Training a Perceptron 11.4 Learning Boolean Functions 11.5 Multilayer Perceptrons 11.6 MLP as a Universal Approximator 11.7 Backpropagation Algorithm 11.7.1 Nonlinear Regression 11.7.2 Two-Class Discrimination 11.7.3 Multiclass Discrimination 11.7.4 Multilabel Discrimination 11.8 Overtraining 11.9 Learning Hidden Representations 11.10 Autoencoders 11.11 Word2vec Architecture 11.12 Notes 11.13 Exercises 11.14 References 12 Deep Learning 12.1 Introduction 12.2 How to Train Multiple Hidden Layers 12.2.1 Rectified Linear Unit 12.2.2 Initialization 12.2.3 Generalizing Backpropagation to Multiple Hidden Layers 12.3 Improving Training Convergence 12.3.1 Momentum 12.3.2 Adaptive Learning Factor 12.3.3 Batch Normalization 12.4 Regularization 12.4.1 Hints 12.4.2 Weight Decay 12.4.3 Dropout 12.5 Convolutional Layers 12.5.1 The Idea 12.5.2 Formalization 12.5.3 Examples: LeNet-5 and AlexNet 12.5.4 Extensions 12.5.5 Multimodal Deep Networks 12.6 Tuning the Network Structure 12.6.1 Structure and Hyperparameter Search 12.6.2 Skip Connections 12.6.3 Gating Units 12.7 Learning Sequences 12.7.1 Example Tasks 12.7.2 Time-Delay Neural Networks 12.7.3 Recurrent Networks 12.7.4 Long Short-Term Memory Unit 12.7.5 Gated Recurrent Unit 12.8 Generative Adversarial Network 12.9 Notes 12.10 Exercises 12.11 References 13 Local Models 13.1 Introduction 13.2 Competitive Learning 13.2.1 Online k-Means 13.2.2 Adaptive Resonance Theory 13.2.3 Self-Organizing Maps 13.3 Radial Basis Functions 13.4 Incorporating Rule-Based Knowledge 13.5 Normalized Basis Functions 13.6 Competitive Basis Functions 13.7 Learning Vector Quantization 13.8 The Mixture of Experts 13.8.1 Cooperative Experts 13.8.2 Competitive Experts 13.9 Hierarchical Mixture of Experts and Soft Decision Trees 13.10 Notes 13.11 Exercises 13.12 References 14 Kernel Machines 14.1 Introduction 14.2 Optimal Separating Hyperplane 14.3 The Nonseparable Case: Soft Margin Hyperplane 14.4 ν-SVM 14.5 Kernel Trick 14.6 Vectorial Kernels 14.7 Defining Kernels 14.8 Multiple Kernel Learning 14.9 Multiclass Kernel Machines 14.10 Kernel Machines for Regression 14.11 Kernel Machines for Ranking 14.12 One-Class Kernel Machines 14.13 Large Margin Nearest Neighbor Classifier 14.14 Kernel Dimensionality Reduction 14.15 Notes 14.16 Exercises 14.17 References 15 Graphical Models 15.1 Introduction 15.2 Canonical Cases for Conditional Independence 15.3 Generative Models 15.4 d-Separation 15.5 Belief Propagation 15.5.1 Chains 15.5.2 Trees 15.5.3 Polytrees 15.5.4 Junction Trees 15.6 Undirected Graphs: Markov Random Fields 15.7 Learning the Structure of a Graphical Model 15.8 Influence Diagrams 15.9 Notes 15.10 Exercises 15.11 References 16 Hidden Markov Models 16.1 Introduction 16.2 Discrete Markov Processes 16.3 Hidden Markov Models 16.4 Three Basic Problems of HMMs 16.5 Evaluation Problem 16.6 Finding the State Sequence 16.7 Learning Model Parameters 16.8 Continuous Observations 16.9 The HMM as a Graphical Model 16.10 Model Selection in HMMs 16.11 Notes 16.12 Exercises 16.13 References 17 Bayesian Estimation 17.1 Introduction 17.2 Bayesian Estimation of the Parameters of a Discrete Distribution 17.2.1 K > 2 States: Dirichlet Distribution 17.2.2 K = 2 States: Beta Distribution 17.3 Bayesian Estimation of the Parameters of a Gaussian Distribution 17.3.1 Univariate Case: Unknown Mean, Known Variance 17.3.2 Univariate Case: Unknown Mean, Unknown Variance 17.3.3 Multivariate Case: Unknown Mean, Unknown Covariance 17.4 Bayesian Estimation of the Parameters of a Function 17.4.1 Regression 17.4.2 Regression with Prior on Noise Precision 17.4.3 The Use of Basis/Kernel Functions 17.4.4 Bayesian Classification 17.5 Choosing a Prior 17.6 Bayesian Model Comparison 17.7 Bayesian Estimation of a Mixture Model 17.8 Nonparametric Bayesian Modeling 17.9 Gaussian Processes 17.10 Dirichlet Processes and Chinese Restaurants 17.11 Latent Dirichlet Allocation 17.12 Beta Processes and Indian Buffets 17.13 Notes 17.14 Exercises 17.15 References 18 Combining Multiple Learners 18.1 Rationale 18.2 Generating Diverse Learners 18.3 Model Combination Schemes 18.4 Voting 18.5 Error-Correcting Output Codes 18.6 Bagging 18.7 Boosting 18.8 The Mixture of Experts Revisited 18.9 Stacked Generalization 18.10 Fine-Tuning an Ensemble 18.10.1 Choosing a Subset of the Ensemble 18.10.2 Constructing Metalearners 18.11 Cascading 18.12 Notes 18.13 Exercises 18.14 References 19 Reinforcement Learning 19.1 Introduction 19.2 Single State Case: K-Armed Bandit 19.3 Elements of Reinforcement Learning 19.4 Model-Based Learning 19.4.1 Value Iteration 19.4.2 Policy Iteration 19.5 Temporal Difference Learning 19.5.1 Exploration Strategies 19.5.2 Deterministic Rewards and Actions 19.5.3 Nondeterministic Rewards and Actions 19.5.4 Eligibility Traces 19.6 Generalization 19.7 Partially Observable States 19.7.1 The Setting 19.7.2 Example: The Tiger Problem 19.8 Deep Q Learning 19.9 Policy Gradients 19.10 Learning to Play Backgammon and Go 19.11 Notes 19.12 Exercises 19.13 References 20 Design and Analysis of Machine Learning Experiments 20.1 Introduction 20.2 Factors, Response, and Strategy of Experimentation 20.3 Response Surface Design 20.4 Randomization, Replication, and Blocking 20.5 Guidelines for Machine Learning Experiments 20.6 Cross-Validation and Resampling Methods 20.6.1 K-Fold Cross-Validation 20.6.2 5 × 2 Cross-Validation 20.6.3 Bootstrapping 20.7 Measuring Classifier Performance 20.8 Interval Estimation 20.9 Hypothesis Testing 20.10 Assessing a Classification Algorithm’s Performance 20.10.1 Binomial Test 20.10.2 Approximate Normal Test 20.10.3 t Test 20.11 Comparing Two Classification Algorithms 20.11.1 McNemar’s Test 20.11.2 K-Fold Cross-Validated Paired t Test 20.11.3 5 × 2 cv Paired t Test 20.11.4 5 × 2 cv Paired F Test 20.12 Comparing Multiple Algorithms: Analysis of Variance 20.13 Comparison over Multiple Datasets 20.13.1 Comparing Two Algorithms 20.13.2 Multiple Algorithms 20.14 Multivariate Tests 20.14.1 Comparing Two Algorithms 20.14.2 Comparing Multiple Algorithms 20.15 Notes 20.16 Exercises 20.17 References A Probability A.1 Elements of Probability A.1.1 Axioms of Probability A.1.2 Conditional Probability A.2 Random Variables A.2.1 Probability Distribution and Density Functions A.2.2 Joint Distribution and Density Functions A.2.3 Conditional Distributions A.2.4 Bayes’ Rule A.2.5 Expectation A.2.6 Variance A.2.7 Weak Law of Large Numbers A.3 Special Random Variables A.3.1 Bernoulli Distribution A.3.2 Binomial Distribution A.3.3 Multinomial Distribution A.3.4 Uniform Distribution A.3.5 Normal (Gaussian) Distribution A.3.6 Chi-Square Distribution A.3.7 t Distribution A.3.8 F Distribution A.4 References B Linear Algebra B.1 Vectors B.2 Matrices B.3 Similarity of Vectors B.4 Square Matrices B.5 Linear Dependence and Ranks B.6 Inverses B.7 Positive Definite Matrices B.8 Trace and Determinant B.9 Eigenvalues and Eigenvectors B.10 Spectral Decomposition B.11 Singular Value Decomposition B.12 References C Optimization C.1 Introduction C.2 Linear Optimization C.3 Convex Optimization C.4 Duality C.5 Local Optimization C.6 References Index
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.