Grokking Machine Learning
- Length: 512 pages
- Edition: 1
- Language: English
- Publisher: Manning
- Publication Date: 2021-12-14
- ISBN-10: 1617295914
- ISBN-13: 9781617295911
- Sales Rank: #428333 (See Top 100 Books)
Discover valuable machine learning techniques you can understand and apply using just high-school math.
In Grokking Machine Learning you will learn:
- Supervised algorithms for classifying and splitting data
- Methods for cleaning and simplifying data
- Machine learning packages and tools
- Neural networks and ensemble methods for complex datasets
Grokking Machine Learning teaches you how to apply ML to your projects using only standard Python code and high school-level math. No specialist knowledge is required to tackle the hands-on exercises using Python and readily available machine learning tools. Packed with easy-to-follow Python-based exercises and mini-projects, this book sets you on the path to becoming a machine learning expert.
Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.
About the technology
Discover powerful machine learning techniques you can understand and apply using only high school math! Put simply, machine learning is a set of techniques for data analysis based on algorithms that deliver better results as you give them more data. ML powers many cutting-edge technologies, such as recommendation systems, facial recognition software, smart speakers, and even self-driving cars. This unique book introduces the core concepts of machine learning, using relatable examples, engaging exercises, and crisp illustrations.
About the book
Grokking Machine Learning presents machine learning algorithms and techniques in a way that anyone can understand. This book skips the confused academic jargon and offers clear explanations that require only basic algebra. As you go, you’ll build interesting projects with Python, including models for spam detection and image recognition. You’ll also pick up practical skills for cleaning and preparing data.
What’s inside
- Supervised algorithms for classifying and splitting data
- Methods for cleaning and simplifying data
- Machine learning packages and tools
- Neural networks and ensemble methods for complex datasets
About the reader
For readers who know basic Python. No machine learning knowledge necessary.
About the author
Luis G. Serrano is a research scientist in quantum artificial intelligence. Previously, he was a Machine Learning Engineer at Google and Lead Artificial Intelligence Educator at Apple.
Table of Contents
1 What is machine learning? It is common sense, except done by a computer
2 Types of machine learning
3 Drawing a line close to our points: Linear regression
4 Optimizing the training process: Underfitting, overfitting, testing, and regularization
5 Using lines to split our points: The perceptron algorithm
6 A continuous approach to splitting points: Logistic classifiers
7 How do you measure classification models? Accuracy and its friends
8 Using probability to its maximum: The naive Bayes model
9 Splitting data by asking questions: Decision trees
10 Combining building blocks to gain more power: Neural networks
11 Finding boundaries with style: Support vector machines and the kernel method
12 Combining models to maximize results: Ensemble learning
13 Putting it all in practice: A real-life example of data engineering and machine learning
Machine Learning contents foreword preface acknowledgments about this book How this book is organized: A roadmap About the code liveBook discussion forum about the author 1 What is machine learning? It is common sense, except done by a computer I am super happy to join you in your learning journey! Machine learning is everywhere Do I need a heavy math and coding background to understand machine learning? Formulas and code are fun when seen as a language OK, so what exactly is machine learning? What is artificial intelligence? What is machine learning? And now that we’re at it, what is deep learning? How do we get machines to make decisions with data? The remember-formulate-predict framework How do humans think? Some machine learning lingo-models and algorithms Some examples of models that humans use Some examples of models that machines use Summary 2 Types of machine learning What is the difference between labeled and unlabeled data? What is data? And what are features? Labels? Predictions Labeled and unlabeled data Supervised learning: The branch of machine learning that works with labeled data Regression models predict numbers Classification models predict a state Unsupervised learning: The branch of machine learning that works with unlabeled data Clustering algorithms split a dataset into similar groups Dimensionality reduction simplifies data without losing too much information Other ways of simplifying our data: Matrix factorization and singular value decomposition Generative machine learning What is reinforcement learning? Summary Exercises 3 Drawing a line close to our points: Linear regression The problem: We need to predict the price of a house The solution: Building a regression model for housing prices The remember step: Looking at the prices of existing houses The formulate step: Formulating a rule that estimates the price of the house The predict step: What do we do when a new house comes on the market? What if we have more variables? Multivariate linear regression Some questions that arise and some quick answers How to get the computer to draw this line: The linear regression algorithm Crash course on slope and y-intercept A simple trick to move a line closer to a set of points, one point at a time The square trick: A much more clever way of moving our line closer to one of the points The absolute trick: Another useful trick to move the line closer to the points The linear regression algorithm: Repeating the absolute or square trick many times to move the line Loading our data and plotting it Using the linear regression algorithm in our dataset Using the model to make predictions The general linear regression algorithm (optional) How do we measure our results? The error function The absolute error: A metric that tells us how good our model is by adding distances The square error: A metric that tells us how good our model is by adding squares of distances Mean absolute and (root) mean square errors are more common in real life Gradient descent: How to decrease an error function by slowly descending from a mountain Plotting the error function and knowing when to stop running the algorithm Do we train using one point at a time or many? Stochastic and batch gradient descent Real-life application: Using Turi Create to predict housing prices in India What if the data is not in a line? Polynomial regression A special kind of curved functions: Polynomials Nonlinear data? No problem: Let’s try to fit a polynomial curve to it Parameters and hyperparameters Applications of regression Recommendation systems Video and music recommendations Product recommendations Health care Summary Exercises 4 Optimizing the training process: Underfitting, overfitting, testing, and regularization An example of underfitting and overfitting using polynomial regression How do we get the computer to pick the right model? By testing How do we pick the testing set, and how big should it be? Can we use our testing data for training the model? No. Where did we break the golden rule, and how do we fix it? The validation set A numerical way to decide how complex our model should be: The model complexity graph Another alternative to avoiding overfitting: Regularization Another example of overfitting: Movie recommendations Measuring how complex a model is: L1 and L2 norm Modifying the error function to solve our problem: Lasso regression and ridge regression Regulating the amount of performance and complexity in our model: The regularization parameter Effects of L1 and L2 regularization in the coefficients of the model An intuitive way to see regularization Polynomial regression, testing, and regularization with Turi Create Summary Exercises 5 Using lines to split our points: The perceptron algorithm The problem: We are on an alien planet, and we don’t know their language! A slightly more complicated planet Does our classifier need to be correct all the time? No A more general classifier and a slightly different way to define lines The step function and activation functions: A condensed way to get predictions What happens if I have more than two words? General definition of the perceptron classifier The bias, the y-intercept, and the inherent mood of a quiet alien How do we determine whether a classifier is good or bad? The error function How to compare classifiers? The error function How to find a good classifier? The perceptron algorithm The perceptron trick: A way to slightly improve the perceptron Repeating the perceptron trick many times: The perceptron algorithm Gradient descent Stochastic and batch gradient descent Coding the perceptron algorithm Coding the perceptron trick Coding the perceptron algorithm Coding the perceptron algorithm using Turi Create Applications of the perceptron algorithm Spam email filters Recommendation Systems Health care Computer vision Summary Exercises 6 A continuous approach to splitting points: Logistic classifiers Logistic classifiers: A continuous version of perceptron classifiers A probability approach to classification: The sigmoid function The dataset and the predictions The error functions: Absolute, square, and log loss Comparing classifiers using the log loss How to find a good logistic classifier? The logistic regression algorithm The logistic trick: A way to slightly improve the continuous perceptron Repeating the logistic trick many times: The logistic regression algorithm Stochastic, mini-batch, and batch gradient descent Coding the logistic regression algorithm Coding the logistic regression algorithm by hand Real-life application: Classifying IMDB reviews with Turi Create Classifying into multiple classes: The softmax function Summary Exercises 7 How do you measure classification models? Accuracy and its friends Accuracy: How often is my model correct? Two examples of models: Coronavirus and spam email A super effective yet super useless model How to fix the accuracy problem? Defining different types of errors and how to measure them False positives and false negatives: Which one is worse? Storing the correctly and incorrectly classified points in a table: The confusion matrix Recall: Among the positive examples, how many did we correctly classify? Precision: Among the examples we classified as positive, how many did we correctly classify? Combining recall and precision as a way to optimize both: The F-score Recall, precision, or F-scores: Which one should we use? A useful tool to evaluate our model: The receiver operating characteristic (ROC) curve Sensitivity and specificity: Two new ways to evaluate our model The receiver operating characteristic (ROC) curve: A way to optimize sensitivity and specificity in A metric that tells us how good our model is: The AUC (area under the curve) How to make decisions using the ROC curve Recall is sensitivity, but precision and specificity are different Summary Exercises 8 Using probability to its maximum: The naive Bayes model Sick or healthy? A story with Bayes’ theorem as the hero Prelude to Bayes’ theorem: The prior, the event, and the posterior Use case: Spam-detection model Finding the prior: The probability that any email is spam Finding the posterior: The probability that an email is spam, knowing that it contains a particular What the math just happened? Turning ratios into probabilities What about two words? The naive Bayes algorithm What about more than two words? Building a spam-detection model with real data Data preprocessing Finding the priors Finding the posteriors with Bayes’ theorem Implementing the naive Bayes algorithm Further work Summary Exercises 9 Splitting data by asking questions: Decision trees The problem: We need to recommend apps to users according to what they are likely to download The solution: Building an app-recommendation system First step to build the model: Asking the best question Second step to build the model: Iterating Last step: When to stop building the tree and other hyperparameters The decision tree algorithm: How to build a decision tree and make predictions with it Beyond questions like yes/no Splitting the data using non-binary categorical features, such as dog/cat/bird Splitting the data using continuous features, such as age The graphical boundary of decision trees Using Scikit-Learn to build a decision tree Real-life application: Modeling student admissions with Scikit-Learn Setting hyperparameters in Scikit-Learn Decision trees for regression Applications Decision trees are widely used in health care Decision trees are useful in recommendation systems Summary Exercises 10 Combining building blocks to gain more power: Neural networks Neural networks with an example: A more complicated alien planet Solution: If one line is not enough, use two lines to classify your dataset Why two lines? Is happiness not linear? Combining the outputs of perceptrons into another perceptron A graphical representation of perceptrons A graphical representation of neural networks The boundary of a neural network The general architecture of a fully connected neural network Training neural networks Error function: A way to measure how the neural network is performing Backpropagation: The key step in training the neural network Potential problems: From overfitting to vanishing gradients Techniques for training neural networks: Regularization and dropout Different activation functions: Hyperbolic tangent (tanh) and the rectified linear unit (ReLU) Neural networks with more than one output: The softmax function Hyperparameters Coding neural networks in Keras A graphical example in two dimensions Training a neural network for image recognition Neural networks for regression Other architectures for more complex datasets How neural networks see: Convolutional neural networks (CNN) How neural networks talk: Recurrent neural networks (RNN), gated recurrent units (GRU), and long sho How neural networks paint paintings: Generative adversarial networks (GAN) Summary Exercises 11 Finding boundaries with style: Support vector machines and the kernel method Using a new error function to build better classifiers Classification error function: Trying to classify the points correctly Distance error function: Trying to separate our two lines as far apart as possible Adding the two error functions to obtain the error function Do we want our SVM to focus more on classification or distance? The C parameter can help us Coding support vector machines in Scikit-Learn Coding a simple SVM The C parameter Training SVMs with nonlinear boundaries: The kernel method Using polynomial equations to our benefit: The polynomial kernel Using bumps in higher dimensions to our benefit: The radial basis function (RBF) kernel Training an SVM with the RBF kernel Coding the kernel method Summary Exercises 12 Combining models to maximize results: Ensemble learning With a little help from our friends Bagging: Joining some weak learners randomly to build a strong learner Fitting a random forest manually Training a random forest in Scikit-Learn AdaBoost: Joining weak learners in a clever way to build a strong learner A big picture of AdaBoost: Building the weak learners Combining the weak learners into a strong learner Coding AdaBoost in Scikit-Learn Gradient boosting: Using decision trees to build strong learners XGBoost: An extreme way to do gradient boosting XGBoost similarity score: A new and effective way to measure similarity in a set Building the weak learners Tree pruning: A way to reduce overfitting by simplifying the weak learners Making the predictions Training an XGBoost model in Python Applications of ensemble methods Summary Exercises 13 Putting it all in practice: A real-life example of data engineering and machine learning The Titanic dataset The features of our dataset Using Pandas to load the dataset Using Pandas to study our dataset Cleaning up our dataset: Missing values and how to deal with them Dropping columns with missing data How to not lose the entire column: Filling in missing data Feature engineering: Transforming the features in our dataset before training the models Turning categorical data into numerical data: One-hot encoding Turning numerical data into categorical data (and why would we want to do this?): Binning Feature selection: Getting rid of unnecessary features Training our models Splitting the data into features and labels, and training and validation Training several models on our dataset Which model is better? Evaluating the models Testing the model Tuning the hyperparameters to find the best model: Grid search Using K-fold cross-validation to reuse our data as training and validation Summary Exercises Appendix A: Solutions to the exercises Chapter 2: Types of machine learning Exercise 2.1 Exercise 2.2 Exercise 2.3 Chapter 3: Drawing a line close to our points: Linear regression Exercise 3.1 Exercise 3.2 Exercise 3.3 Exercise 3.4 Chapter 4: Optimizing the training process: Underfitting, overfitting, testing, and regularization Exercise 4.1 Exercise 4.2 Chapter 5: Using lines to split our points: The perceptron algorithm Exercise 5.1 Exercise 5.2 Exercise 5.3 Chapter 6: A continuous approach to splitting points: Logistic classifiers Exercise 6.1 Exercise 6.2 Exercise 6.3 Chapter 7: How do you measure classification models? Accuracy and its friends Exercise 7.1 Exercise 7.2 Exercise 7.3 Exercise 7.4 Chapter 8: Using probability to its maximum: The naive Bayes model Exercise 8.1 Exercise 8.2 Exercise 8.3 Chapter 9: Splitting data by asking questions: Decision trees Exercise 9.1 Exercise 9.2 Exercise 9.3 Chapter 10: Combining building blocks to gain more power: Neural networks Exercise 10.1 Exercise 10.2 Exercise 10.3 Chapter 11: Finding boundaries with style: Support vector machines and the kernel method Exercise 11.1 Exercise 11.2 Chapter 12: Combining models to maximize results: Ensemble learning Exercise 12.1 Exercise 12.2 Chapter 13: Putting it all in practice: A real-life example of data engineering and machine learning Exercise 13.1 Appendix B: The math behind gradient descent: Coming down a mountain using derivatives and slopes Using gradient descent to decrease functions Using gradient descent to train models Using gradient descent to train linear regression models Using gradient descent to train classification models Using gradient descent to train neural networks Using gradient descent for regularization Getting stuck on local minima: How it happens, and how we solve it Appendix C: References General references Courses Blogs and YouTube channels Books Chapter Videos Chapter Videos Books Courses Chapter Code Datasets Videos Chapter Code Videos Chapter Code Videos Chapter Code Datasets Videos Chapter Videos Chapter Code Datasets Visibility: Public Videos Chapter Code Datasets Videos Blog post Chapter Code Datasets Videos Books Courses Blog posts Tools Chapter Code Videos Blog posts Chapter Code Videos Articles and blog posts Chapter Code Datasets Graphics and image icons index A B C D E F G H I J K L M N O P R S T U V W X
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.