Grokking Machine Learning

Length: 512 pages
Edition: 1
Language: English
Publisher: Manning
Publication Date: 2021-12-14
ISBN-10: 1617295914
ISBN-13: 9781617295911
Sales Rank: #428333 (See Top 100 Books)

Discover valuable machine learning techniques you can understand and apply using just high-school math.

In Grokking Machine Learning you will learn:

Supervised algorithms for classifying and splitting data
Methods for cleaning and simplifying data
Machine learning packages and tools
Neural networks and ensemble methods for complex datasets

Grokking Machine Learning teaches you how to apply ML to your projects using only standard Python code and high school-level math. No specialist knowledge is required to tackle the hands-on exercises using Python and readily available machine learning tools. Packed with easy-to-follow Python-based exercises and mini-projects, this book sets you on the path to becoming a machine learning expert.

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the technology

Discover powerful machine learning techniques you can understand and apply using only high school math! Put simply, machine learning is a set of techniques for data analysis based on algorithms that deliver better results as you give them more data. ML powers many cutting-edge technologies, such as recommendation systems, facial recognition software, smart speakers, and even self-driving cars. This unique book introduces the core concepts of machine learning, using relatable examples, engaging exercises, and crisp illustrations.

About the book

Grokking Machine Learning presents machine learning algorithms and techniques in a way that anyone can understand. This book skips the confused academic jargon and offers clear explanations that require only basic algebra. As you go, you’ll build interesting projects with Python, including models for spam detection and image recognition. You’ll also pick up practical skills for cleaning and preparing data.

What’s inside

Supervised algorithms for classifying and splitting data
Methods for cleaning and simplifying data
Machine learning packages and tools
Neural networks and ensemble methods for complex datasets

About the reader

For readers who know basic Python. No machine learning knowledge necessary.

About the author

Luis G. Serrano is a research scientist in quantum artificial intelligence. Previously, he was a Machine Learning Engineer at Google and Lead Artificial Intelligence Educator at Apple.

Table of Contents

1 What is machine learning? It is common sense, except done by a computer
2 Types of machine learning
3 Drawing a line close to our points: Linear regression
4 Optimizing the training process: Underfitting, overfitting, testing, and regularization
5 Using lines to split our points: The perceptron algorithm
6 A continuous approach to splitting points: Logistic classifiers
7 How do you measure classification models? Accuracy and its friends
8 Using probability to its maximum: The naive Bayes model
9 Splitting data by asking questions: Decision trees
10 Combining building blocks to gain more power: Neural networks
11 Finding boundaries with style: Support vector machines and the kernel method
12 Combining models to maximize results: Ensemble learning
13 Putting it all in practice: A real-life example of data engineering and machine learning

Machine Learning
contents
foreword
preface
acknowledgments
about this book
	How this book is organized: A roadmap
	About the code
	liveBook discussion forum
about the author
1 What is machine learning? It is common sense, except done by a computer
	I am super happy to join you in your learning journey!
	Machine learning is everywhere
	Do I need a heavy math and coding background to understand machine learning?
		Formulas and code are fun when seen as a language
	OK, so what exactly is machine learning?
		What is artificial intelligence?
		What is machine learning?
		And now that we’re at it, what is deep learning?
	How do we get machines to make decisions with data? The remember-formulate-predict framework
		How do humans think?
		Some machine learning lingo-models and algorithms
		Some examples of models that humans use
		Some examples of models that machines use
	Summary
2 Types of machine learning
	What is the difference between labeled and unlabeled data?
		What is data?
		And what are features?
		Labels?
		Predictions
		Labeled and unlabeled data
	Supervised learning: The branch of machine learning that works with labeled data
		Regression models predict numbers
		Classification models predict a state
	Unsupervised learning: The branch of machine learning that works with unlabeled data
		Clustering algorithms split a dataset into similar groups
		Dimensionality reduction simplifies data without losing too much information
		Other ways of simplifying our data: Matrix factorization and singular value decomposition
		Generative machine learning
	What is reinforcement learning?
	Summary
	Exercises
3 Drawing a line close to our points: Linear regression
	The problem: We need to predict the price of a house
	The solution: Building a regression model for housing prices
		The remember step: Looking at the prices of existing houses
		The formulate step: Formulating a rule that estimates the price of the house
		The predict step: What do we do when a new house comes on the market?
		What if we have more variables? Multivariate linear regression
		Some questions that arise and some quick answers
	How to get the computer to draw this line: The linear regression algorithm
		Crash course on slope and y-intercept
		A simple trick to move a line closer to a set of points, one point at a time
		The square trick: A much more clever way of moving our line closer to one of the points
		The absolute trick: Another useful trick to move the line closer to the points
		The linear regression algorithm: Repeating the absolute or square trick many times to move the line
		Loading our data and plotting it
		Using the linear regression algorithm in our dataset
		Using the model to make predictions
		The general linear regression algorithm (optional)
	How do we measure our results? The error function
		The absolute error: A metric that tells us how good our model is by adding distances
		The square error: A metric that tells us how good our model is by adding squares of distances
		Mean absolute and (root) mean square errors are more common in real life
		Gradient descent: How to decrease an error function by slowly descending from a mountain
		Plotting the error function and knowing when to stop running the algorithm
		Do we train using one point at a time or many? Stochastic and batch gradient descent
	Real-life application: Using Turi Create to predict housing prices in India
	What if the data is not in a line? Polynomial regression
		A special kind of curved functions: Polynomials
		Nonlinear data? No problem: Let’s try to fit a polynomial curve to it
	Parameters and hyperparameters
	Applications of regression
		Recommendation systems
		Video and music recommendations
		Product recommendations
		Health care
	Summary
	Exercises
4 Optimizing the training process: Underfitting, overfitting, testing, and regularization
	An example of underfitting and overfitting using polynomial regression
	How do we get the computer to pick the right model? By testing
		How do we pick the testing set, and how big should it be?
		Can we use our testing data for training the model? No.
	Where did we break the golden rule, and how do we fix it? The validation set
	A numerical way to decide how complex our model should be: The model complexity graph
	Another alternative to avoiding overfitting: Regularization
		Another example of overfitting: Movie recommendations
		Measuring how complex a model is: L1 and L2 norm
		Modifying the error function to solve our problem: Lasso regression and ridge regression
		Regulating the amount of performance and complexity in our model: The regularization parameter
		Effects of L1 and L2 regularization in the coefficients of the model
		An intuitive way to see regularization
	Polynomial regression, testing, and regularization with Turi Create
	Summary
	Exercises
5 Using lines to split our points: The perceptron algorithm
	The problem: We are on an alien planet, and we don’t know their language!
		A slightly more complicated planet
		Does our classifier need to be correct all the time? No
		A more general classifier and a slightly different way to define lines
		The step function and activation functions: A condensed way to get predictions
		What happens if I have more than two words? General definition of the perceptron classifier
		The bias, the y-intercept, and the inherent mood of a quiet alien
	How do we determine whether a classifier is good or bad? The error function
		How to compare classifiers? The error function
	How to find a good classifier? The perceptron algorithm
		The perceptron trick: A way to slightly improve the perceptron
		Repeating the perceptron trick many times: The perceptron algorithm
		Gradient descent
		Stochastic and batch gradient descent
	Coding the perceptron algorithm
		Coding the perceptron trick
		Coding the perceptron algorithm
		Coding the perceptron algorithm using Turi Create
	Applications of the perceptron algorithm
		Spam email filters
		Recommendation Systems
		Health care
		Computer vision
	Summary
	Exercises
6 A continuous approach to splitting points: Logistic classifiers
	Logistic classifiers: A continuous version of perceptron classifiers
		A probability approach to classification: The sigmoid function
		The dataset and the predictions
		The error functions: Absolute, square, and log loss
		Comparing classifiers using the log loss
	How to find a good logistic classifier? The logistic regression algorithm
		The logistic trick: A way to slightly improve the continuous perceptron
		Repeating the logistic trick many times: The logistic regression algorithm
		Stochastic, mini-batch, and batch gradient descent
	Coding the logistic regression algorithm
		Coding the logistic regression algorithm by hand
	Real-life application: Classifying IMDB reviews with Turi Create
	Classifying into multiple classes: The softmax function
	Summary
	Exercises
7 How do you measure classification models? Accuracy and its friends
	Accuracy: How often is my model correct?
		Two examples of models: Coronavirus and spam email
		A super effective yet super useless model
	How to fix the accuracy problem? Defining different types of errors and how to measure them
		False positives and false negatives: Which one is worse?
		Storing the correctly and incorrectly classified points in a table: The confusion matrix
		Recall: Among the positive examples, how many did we correctly classify?
		Precision: Among the examples we classified as positive, how many did we correctly classify?
		Combining recall and precision as a way to optimize both: The F-score
		Recall, precision, or F-scores: Which one should we use?
	A useful tool to evaluate our model: The receiver operating characteristic (ROC) curve
		Sensitivity and specificity: Two new ways to evaluate our model
		The receiver operating characteristic (ROC) curve: A way to optimize sensitivity and specificity in
		A metric that tells us how good our model is: The AUC (area under the curve)
		How to make decisions using the ROC curve
		Recall is sensitivity, but precision and specificity are different
	Summary
	Exercises
8 Using probability to its maximum: The naive Bayes model
	Sick or healthy? A story with Bayes’ theorem as the hero
		Prelude to Bayes’ theorem: The prior, the event, and the posterior
	Use case: Spam-detection model
		Finding the prior: The probability that any email is spam
		Finding the posterior: The probability that an email is spam, knowing that it contains a particular
		What the math just happened? Turning ratios into probabilities
		What about two words? The naive Bayes algorithm
		What about more than two words?
	Building a spam-detection model with real data
		Data preprocessing
		Finding the priors
		Finding the posteriors with Bayes’ theorem
		Implementing the naive Bayes algorithm
		Further work
	Summary
	Exercises
9 Splitting data by asking questions: Decision trees
	The problem: We need to recommend apps to users according to what they are likely to download
	The solution: Building an app-recommendation system
		First step to build the model: Asking the best question
		Second step to build the model: Iterating
		Last step: When to stop building the tree and other hyperparameters
		The decision tree algorithm: How to build a decision tree and make predictions with it
	Beyond questions like yes/no
		Splitting the data using non-binary categorical features, such as dog/cat/bird
		Splitting the data using continuous features, such as age
	The graphical boundary of decision trees
		Using Scikit-Learn to build a decision tree
	Real-life application: Modeling student admissions with Scikit-Learn
		Setting hyperparameters in Scikit-Learn
	Decision trees for regression
	Applications
		Decision trees are widely used in health care
		Decision trees are useful in recommendation systems
	Summary
	Exercises
10 Combining building blocks to gain more power: Neural networks
	Neural networks with an example: A more complicated alien planet
		Solution: If one line is not enough, use two lines to classify your dataset
		Why two lines? Is happiness not linear?
		Combining the outputs of perceptrons into another perceptron
		A graphical representation of perceptrons
		A graphical representation of neural networks
		The boundary of a neural network
		The general architecture of a fully connected neural network
	Training neural networks
		Error function: A way to measure how the neural network is performing
		Backpropagation: The key step in training the neural network
		Potential problems: From overfitting to vanishing gradients
		Techniques for training neural networks: Regularization and dropout
		Different activation functions: Hyperbolic tangent (tanh) and the rectified linear unit (ReLU)
		Neural networks with more than one output: The softmax function
		Hyperparameters
	Coding neural networks in Keras
		A graphical example in two dimensions
		Training a neural network for image recognition
	Neural networks for regression
	Other architectures for more complex datasets
		How neural networks see: Convolutional neural networks (CNN)
		How neural networks talk: Recurrent neural networks (RNN), gated recurrent units (GRU), and long sho
		How neural networks paint paintings: Generative adversarial networks (GAN)
	Summary
	Exercises
11 Finding boundaries with style: Support vector machines and the kernel method
	Using a new error function to build better classifiers
		Classification error function: Trying to classify the points correctly
		Distance error function: Trying to separate our two lines as far apart as possible
		Adding the two error functions to obtain the error function
		Do we want our SVM to focus more on classification or distance? The C parameter can help us
	Coding support vector machines in Scikit-Learn
		Coding a simple SVM
		The C parameter
	Training SVMs with nonlinear boundaries: The kernel method
		Using polynomial equations to our benefit: The polynomial kernel
		Using bumps in higher dimensions to our benefit: The radial basis function (RBF) kernel
		Training an SVM with the RBF kernel
		Coding the kernel method
	Summary
	Exercises
12 Combining models to maximize results: Ensemble learning
	With a little help from our friends
	Bagging: Joining some weak learners randomly to build a strong learner
		Fitting a random forest manually
		Training a random forest in Scikit-Learn
	AdaBoost: Joining weak learners in a clever way to build a strong learner
		A big picture of AdaBoost: Building the weak learners
		Combining the weak learners into a strong learner
		Coding AdaBoost in Scikit-Learn
	Gradient boosting: Using decision trees to build strong learners
	XGBoost: An extreme way to do gradient boosting
		XGBoost similarity score: A new and effective way to measure similarity in a set
		Building the weak learners
		Tree pruning: A way to reduce overfitting by simplifying the weak learners
		Making the predictions
		Training an XGBoost model in Python
	Applications of ensemble methods
	Summary
	Exercises
13 Putting it all in practice: A real-life example of data engineering and machine learning
	The Titanic dataset
		The features of our dataset
		Using Pandas to load the dataset
		Using Pandas to study our dataset
	Cleaning up our dataset: Missing values and how to deal with them
		Dropping columns with missing data
		How to not lose the entire column: Filling in missing data
	Feature engineering: Transforming the features in our dataset before training the models
		Turning categorical data into numerical data: One-hot encoding
		Turning numerical data into categorical data (and why would we want to do this?): Binning
		Feature selection: Getting rid of unnecessary features
	Training our models
		Splitting the data into features and labels, and training and validation
		Training several models on our dataset
		Which model is better? Evaluating the models
		Testing the model
	Tuning the hyperparameters to find the best model: Grid search
	Using K-fold cross-validation to reuse our data as training and validation
	Summary
	Exercises
Appendix A: Solutions to the exercises
	Chapter 2: Types of machine learning
		Exercise 2.1
		Exercise 2.2
		Exercise 2.3
	Chapter 3: Drawing a line close to our points: Linear regression
		Exercise 3.1
		Exercise 3.2
		Exercise 3.3
		Exercise 3.4
	Chapter 4: Optimizing the training process: Underfitting, overfitting, testing, and regularization
		Exercise 4.1
		Exercise 4.2
	Chapter 5: Using lines to split our points: The perceptron algorithm
		Exercise 5.1
		Exercise 5.2
		Exercise 5.3
	Chapter 6: A continuous approach to splitting points: Logistic classifiers
		Exercise 6.1
		Exercise 6.2
		Exercise 6.3
	Chapter 7: How do you measure classification models? Accuracy and its friends
		Exercise 7.1
		Exercise 7.2
		Exercise 7.3
		Exercise 7.4
	Chapter 8: Using probability to its maximum: The naive Bayes model
		Exercise 8.1
		Exercise 8.2
		Exercise 8.3
	Chapter 9: Splitting data by asking questions: Decision trees
		Exercise 9.1
		Exercise 9.2
		Exercise 9.3
	Chapter 10: Combining building blocks to gain more power: Neural networks
		Exercise 10.1
		Exercise 10.2
		Exercise 10.3
	Chapter 11: Finding boundaries with style: Support vector machines and the kernel method
		Exercise 11.1
		Exercise 11.2
	Chapter 12: Combining models to maximize results: Ensemble learning
		Exercise 12.1
		Exercise 12.2
	Chapter 13: Putting it all in practice: A real-life example of data engineering and machine learning
		Exercise 13.1
Appendix B: The math behind gradient descent: Coming down a mountain using derivatives and slopes
	Using gradient descent to decrease functions
	Using gradient descent to train models
		Using gradient descent to train linear regression models
		Using gradient descent to train classification models
		Using gradient descent to train neural networks
	Using gradient descent for regularization
	Getting stuck on local minima: How it happens, and how we solve it
Appendix C: References
	General references
	Courses
	Blogs and YouTube channels
	Books
	Chapter
		Videos
	Chapter
		Videos
		Books
		Courses
	Chapter
		Code
		Datasets
		Videos
	Chapter
		Code
		Videos
	Chapter
		Code
		Videos
	Chapter
		Code
		Datasets
		Videos
	Chapter
		Videos
	Chapter
		Code
		Datasets
		Visibility: Public Videos
	Chapter
		Code
		Datasets
		Videos
		Blog post
	Chapter
		Code
		Datasets
		Videos
		Books
		Courses
		Blog posts
		Tools
	Chapter
		Code
		Videos
		Blog posts
	Chapter
		Code
		Videos
		Articles and blog posts
	Chapter
		Code
		Datasets
	Graphics and image icons
index
	A
	B
	C
	D
	E
	F
	G
	H
	I
	J
	K
	L
	M
	N
	O
	P
	R
	S
	T
	U
	V
	W
	X