Machine Learning – A Journey to Deep Learning: With Exercises and Answers
- Length: 624 pages
- Edition: 1
- Language: English
- Publisher: World Scientific Pub Co Inc
- Publication Date: 2021-02-08
- ISBN-10: 9811234051
- ISBN-13: 9789811234057
- Sales Rank: #7420053 (See Top 100 Books)
This unique compendium discusses some core ideas for the development and implementation of machine learning from three different perspectives — the statistical perspective, the artificial neural network perspective and the deep learning methodology.The useful reference text represents a solid foundation in machine learning and should prepare readers to apply and understand machine learning algorithms as well as to invent new machine learning methods. It tells a story outgoing from a perceptron to deep learning highlighted with concrete examples, including exercises and answers for the students.
Cover Page Title Page Copyright Page Dedication Preface Contents 1. Introduction 1.1 What is Machine Learning 1.1.1 Symbolical Learning 1.1.2 Statistical Machine Learning 1.1.3 Supervised and Unsupervised Machine Learning 1.2 It all began with the Perceptron 1.2.1 Artificial Neuron 1.2.2 Perceptron 1.2.3 XOR-Problem 1.3 Road to Deep Learning 1.3.1 Backpropagation 1.4 Synopsis 1.4.1 Content 1.5 Exercises and Answers 2. Probability and Information 2.1 Probability Theory 2.1.1 Conditional probability 2.1.2 Law of Total Probability 2.1.3 Bayes’s rule 2.1.4 Expectation 2.1.5 Covariance 2.2 Distribution 2.2.1 Gaussian Distribution 2.2.2 Laplace Distribution 2.2.3 Bernoulli Distribution 2.3 Information Theory 2.3.1 Surprise and Information 2.3.2 Entropy 2.3.3 Conditional Entropy 2.3.4 Relative Entropy 2.3.5 Mutual Information 2.3.6 Relationship 2.4 Cross Entropy 2.5 Exercises and Answers 3. Linear Algebra and Optimization 3.1 Vectors 3.1.1 Norm 3.1.2 Distance function 3.1.3 Scalar Product 3.1.4 Linear Independent Vectors 3.1.5 Matrix Operations 3.1.6 Tensor Product 3.1.7 Hadamard product 3.1.8 Element-wise division 3.2 Matrix Calculus 3.2.1 Gradient 3.2.2 Jacobian 3.2.3 Hessian Matrix 3.3 Gradient based Numerical Optimization 3.3.1 Gradient descent 3.3.2 Newton’s Method 3.3.3 Second and First Order Optimization 3.4 Dilemmas in Machine Learning 3.4.1 The Curse of Dimensionality 3.4.2 Numerical Computation 3.5 Exercises and Answers 4. Linear and Nonlinear Regression 4.1 Linear Regression 4.1.1 Regression of a Line 4.1.2 Multiple Linear Regression 4.1.3 Design Matrix 4.1.4 Squared-Error 4.1.5 Closed-Form Solution 4.1.6 Example 4.1.7 Moore-Penrose Matrix 4.2 Linear Basis Function Models 4.2.1 Example Logarithmic Curve 4.2.2 Example Polynomial Regression 4.3 Model selection 4.4 Bayesian Regression 4.4.1 Maximizing the Likelihood or the Posterior . . . 4.4.2 Bayesian Learning 4.4.3 Maximizing a posteriori 4.4.4 Relation between Regularized Least-Squares and MAP 4.4.5 LASSO Regularizer 4.5 Linear Regression for classification 4.6 Exercises and Answers 5. Perceptron 5.1 Linear Regression and Linear Artificial Neuron 5.1.1 Regularization 5.1.2 Stochastic gradient descent 5.2 Continuous Differentiable Activation Functions 5.2.1 Sigmoid Activation Functions 5.2.2 Perceptron with sgn0 5.2.3 Cross Entropy Loss Function 5.2.4 Linear Unit versus Sigmoid Unit 5.2.5 Logistic Regression 5.3 Multiclass Linear Discriminant 5.3.1 Cross Entropy Loss Function for softmax 5.3.2 Logistic Regression Algorithm 5.4 Multilayer Perceptron 5.5 Exercises and Answers 6. Multilayer Perceptron 6.1 Motivations 6.2 Networks with Hidden Nonlinear Layers 6.2.1 Backpropagation 6.2.2 Example 6.2.3 Activation Function 6.3 Cross Entropy Error Function 6.3.1 Backpropagation 6.3.2 Comparison 6.3.3 Computing Power 6.3.4 Generalization 6.4 Training 6.4.1 Overfitting 6.4.2 Early-Stopping Rule 6.4.3 Regularization 6.5 Deep Learning and Backpropagation 6.6 Exercises and Answers 7. Learning Theory 7.1 Supervised Classification Problem 7.2 Probability of a bad sample 7.3 Infinite hypotheses set 7.4 The VC Dimension 7.5 A Fundamental Trade-off 7.6 Computing VC Dimension 7.6.1 The VC Dimension of a Perceptron 7.6.2 A Heuristic way to measure hypotheses space complexity 7.7 The Regression Problem 7.7.1 Example 7.8 Exercises and Answers 8. Model Selection 8.1 The confusion matrix 8.1.1 Precision and Recall 8.1.2 Several Classes 8.2 Validation Set and Test Set 8.3 Cross-Validation 8.4 Minimum-Description-Length 8.4.1 Occam’s razor 8.4.2 Kolmogorov complexity theory 8.4.3 Learning as Data Compression 8.4.4 Two-part code MDL principle 8.5 Paradox of Deep Learning Complexity 8.6 Exercises and Answers 9. Clustering 9.1 Introduction 9.2 K-means Clustering 9.2.1 Standard K-means 9.2.2 Sequential K-means 9.3 Mixture of Gaussians 9.3.1 EM for Gaussian Mixtures 9.3.2 Algorithm: EM for Gaussian mixtures 9.3.3 Example 9.4 EM and K-means Clustering 9.5 Exercises and Answers 10. Radial Basis Networks 10.1 Cover’s theorem 10.1.1 Cover’s theorem on the separability (1965) 10.2 Interpolation Problem 10.2.1 Micchelli’s Theorem 10.3 Radial Basis Function Networks 10.3.1 Modifications of Radial Basis Function Networks 10.3.2 Interpretation of Hidden Units 10.4 Exercises and Answers 11. Support Vector Machines 11.1 Margin 11.2 Optimal Hyperplane for Linear Separable Patterns 11.3 Support Vectors 11.4 Quadratic Optimization for Finding the Optimal Hyperplane 11.4.1 Dual Problem 11.5 Optimal Hyperplane for Non-separable Patterns 11.5.1 Dual Problem 11.6 Support Vector Machine as a Kernel Machine 11.6.1 Kernel Trick 11.6.2 Dual Problem 11.6.3 Classification 11.7 Constructing Kernels 11.7.1 Gaussian Kernel 11.7.2 Sigmoidal Kernel 11.7.3 Generative mode Kernels 11.8 Conclusion 11.8.1 SVMs, MLPs and RBFNs 11.9 Exercises and Answers 12. Deep Learning 12.1 Introduction 12.1.1 Loss Function 12.1.2 Mini-Batch 12.2 Why Deep Networks? 12.2.1 Hierarchical Organization 12.2.2 Boolean Functions 12.2.3 Curse of dimensionality 12.2.4 Local Minima 12.2.5 Can represent big training sets 12.2.6 Efficient Model Selection 12.2.7 Criticism of Deep Neural Networks 12.3 Vanishing Gradients Problem 12.3.1 Rectified Linear Unit (ReLU) 12.3.2 Residual Learning 12.3.3 Batch Normalization 12.4 Regularization by Dropout 12.5 Weight Initialization 12.6 Faster Optimizers 12.6.1 Momentum 12.6.2 Nestrov Momentum 12.6.3 AdaGrad 12.6.4 RMSProp 12.6.5 Adam 12.6.6 Notation 12.7 Transfer Learning 12.8 Conclusion 12.9 Exercises and Answers 13. Convolutional Networks 13.1 Hierarchical Networks 13.1.1 Biological Vision 13.1.2 Neocognitron 13.1.3 Map transformation cascade 13.2 Convolutional Neural Networks 13.2.1 CNNs and Kernels in Image Processing 13.2.2 Data Augmentation 13.2.3 Case Studies 13.3 Exercises and Answers 14. Recurrent Networks 14.1 Sequence Modelling 14.2 Recurrent Neural Networks 14.2.1 Elman recurrent neural networks 14.2.2 Jordan recurrent neural networks 14.2.3 Single Output 14.2.4 Backpropagation Trough Time 14.2.5 Deep Recurrent Networks 14.3 Long Short Term Memory 14.4 Process Sequences 14.5 Exercises and Answers 15. Autoencoders 15.1 Eigenvectors and Eigenvalues 15.2 The Karhunen-Loève transform 15.2.1 Principal component analysis 15.3 Singular Value Decomposition 15.3.1 Example 15.3.2 Pseudoinverse 15.3.3 SVD and PCA 15.4 Autoencoders 15.5 Undercomplete Autoencoders 15.6 Overcomplete Autoencoders 15.6.1 Denoising Autoencoders 15.7 Exercises and Answers 16. Epilogue Bibliography Index
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.