Pragmatic Machine Learning with Python: Learn How to Deploy Machine Learning Models in Production
- Length: 338 pages
- Edition: 1
- Language: English
- Publisher: BPB Publications
- Publication Date: 2020-04-30
- ISBN-10: 938984536X
- ISBN-13: 9789389845365
- Sales Rank: #2701176 (See Top 100 Books)
An easy-to-understand guide to learn practical Machine Learning techniques with Mathematical foundations
Key Features
- A balanced combination of underlying mathematical theories & practical examples with Python code
- Coverage of latest topics like multi-label classification, Text Mining, Doc2Vec, Word2Vec, XMeans clustering, unsupervised outlier detection, techniques to deploy ML models in production-grade systems with PMML, etc
- Coverage of sufficient & relevant visualization techniques specific to any topic
Description
This book will be ideal for working professionals who want to learn Machine Learning from scratch. The first chapter will be an introductory chapter to make readers comfortable with the idea of Machine Learning and the required mathematical theories. There will be a balanced combination of underlying mathematical theories corresponding to any Machine Learning topic and its implementation using Python. Most of the implementations will be based on ‘scikit-learn,’ but other Python libraries like ‘Gensim’ or ‘PyTorch’ will also be used for some topics like text analytics or deep learning. The book will be divided into chapters based on primary Machine Learning topics like Classification, Regression, Clustering, Deep Learning, Text Mining, etc. The book will also explain different techniques of putting Machine Learning models into production-grade systems using Big Data or Non-Big Data flavors and standards for exporting models.
What will you learn
- Get familiar with practical concepts of Machine Learning from ground zero
- Learn how to deploy Machine Learning models in production
- Understand how to do “Data Science Storytelling”
- Explore the latest topics in the current industry about Machine Learning
Who this book is for
This book would be ideal for experienced Software Professionals who are trying to get into the field of Machine Learning. Anyone who wishes to Learn Machine Learning concepts and models in the production lifecycle.
Table of Contents
1. Introduction to Machine Learning & Mathematical preliminaries
2. Classification
3. Regression
4. Clustering
5. Deep Learning & Neural Networks
6. Miscellaneous Unsupervised Learning
7. Text Mining
8. Machine Learning models in production
9. Case Studies & Data Science Storytelling
About the Author
Avishek has a Master’s degree in Data Analytics & Machine Learning from BITS (Pilani) and a Bachelor’s degree in Computer Science from West Bengal University of Technology (WBUT). He has more than 14 years of experience in different renowned companies like VMware, Cognizant, Cisco, Mobile Iron, etc. He started his career as a Java developer and later moved to the core area of Machine Learning around five years back. He has practical experience in the design & development of Machine Learning systems, starting from inception to production in multiple organizations. Strong foundations in Mathematics/Statistics and a solid experience in product development had helped him to excel quickly in the world of ML & Data Science. He has shared his knowledge & experience through this book, which can help any Software Engineer to kick start in this area. He also writes blogs, and the same can be found at https://medium.com/@avisheknag17
Your Blog links: https://medium.com/@avisheknag17
Your LinkedIn Profile: https://www.linkedin.com/in/avishek-nag-957a0015/
Cover Page Title Page Copyright Page Dedication About the Author Acknowledgement Preface Errata Table of Contents 1. Introduction to Machine Learning and Mathematical Preliminaries Structure Objective Purpose of machine learning What is a machine learning model? What is a dataset? What are the variables and features? Predictor and target variables Types of variables: Continuous and categorical Lifecycle of a machine learning model Pre-conditions of a successful ML project Different types of the learning process Supervised learning Unsupervised learning Parameter and hyperparameter Machine learning models by objective Predictive machine learning Descriptive machine learning Machine learning models by problem type Classification model Regression model Clustering model Dimensionality reduction model Machine learning models by assumptions Parametric model Non-parametric model Accuracy of the ML model Training and testing dataset Accuracy for classification Accuracy for regression Accuracy for clustering Bias-Variance decomposition Underfitting andoverfitting Mathematical concepts in machine learning Definition of data point Dataset as a vector space Norm of a Vector Euclidean distance Similarity of vectors Eigenvalues and Eigenvectors Variable transformation andimputation Scaling and normalization Min-max scaling Standard scaling Categorical to continuous variable transformation One-hot encoding Continuous to categorical variable transformation Imputation Measures of variance Coefficient of variance (CV) Conclusion 2. Classification Structure Objective Problem formulation Binary and multi-class Class boundary Linear and non-linear class boundary The general approach for solving classification A brief introduction to scikit-learn Training process – fit function Testing/validation process – predict function Concept of pipeline Without pipeline With Pipeline The Bayesian approach of classification Applying Bayes theorem in classification Prior and posterior probability Formulation Naïve Bayes classifier Conditional independence Accuracy Example using an abstract dataset Training process Validation with a data instance Laplace estimation Handling continuous attributes Naïve Bayes Classifier using Python Pre-processing Training and testing set Building the pipeline Gaussian Naïve Bayes Multinomial Naïve Bayes Advantage of Naïve Bayes Classifier The disadvantage of Naïve Bayes Classifier Logistic Regression Classifier Training process Logistic Regression Classifier using Python Pre-processing and building the pipeline Overfitting and regularization Regularization Multi-Class Logistic Regression Advantage of Logistic Regression Classifier The disadvantage of Logistic Regression Classifier Decision Tree Classifier Anatomy of a Decision Tree Handling continuous and categorical attributes Categorical attribute Continuous attribute Measure and technique of splitting a node Impurity and mathematical measures ID3 algorithm of building Decision Tree Python implementation of the Decision Tree Pre-processing and building the pipeline Visualization of the Decision Tree using Python Advantage of Decision Tree Classifier The disadvantage of Decision Tree Classifier Class imbalance problem Alternative metrics Confusion Matrix Ratio based metrics Accuracy metric for multi-class and imbalanced dataset Receiver Operating Characteristic Curve (ROC) Python implementation of ROC generation Mitigating class imbalance problem Class weight adjustment approach Sampling-based approach Ensemble classification models Bagging RandomForest model Python implementation of RandomForest Multi-label classification models Problem formulation Problem decomposition and Umbrella classification scheme approach Binary relevance scheme Classifier chain scheme Label powerset scheme Comparison of each scheme Accuracy metrics Hamming loss metric Python implementation of multi-label classifier Conclusion 3. Regression Structure Objectives Mathematical problem definition of Regression Linear vs.non-linear relationships Conversion between linear and non-linear relationships Building a linear regression model General approach tosolving linear regression Ordinary Least Squares (OLS) Gradient Descent Accuracy of linear regression (R2 measure) Selection of features in linear regression Adjusted R2 Forward selection Backward selection Forward or backward: When to use what Key points to remember in linear regression Polynomial regression Regularization L1 regularization or Lasso L2 regularization or Ridge Parametric regression models to explain facts Tree-based regression Comparison of different regression techniques Conclusion 4. Clustering Structure Objectives Formal definition of clustering Concept of cluster Similarity metrics Center-based clustering K-means clustering Basic K-means algorithm Hyper-parameters Python implementation of K-means clustering Pre-processing Pipeline creation Clustering as new feature space The sensitivity of KMeans with centroid initialization Visualization of clusters Accuracy metrics Cohesion andseparation of clusters Silhouette coefficient Python implementation of cluster metric Advantages of KMeans Disadvantages of KMeans Determining optimal K in KMeans Elbow method XMeans clustering Computation of Log-likelihood Python implementation of XMeans Density-based clustering DBSCAN clustering Python implementation of DBSCAN clustering Visualization of clusters Advantages of DBSCAN Disadvantages of DBSCAN Determining optimal parameters of DBSCAN K-distance plot for determining Eps Hierarchical clustering Python implementation of Agglomerative clustering Visualization of clusters Visualization of hierarchical clusters with a dendrogram Clustering to solve a classification problem Computation of class probabilities Classification process Python implementation of the model Visualization of clusters and classification accuracy Key points to remember about clustering-based classification Conclusion 5. Deep Learning Structure Objectives What is deep learning? Why is deep learning required? Neural network Anatomy of a neural network Perceptron Activation function Sigmoid or logistic Tanh ReLU (Rectified Linear Unit) Linear Layers Loss function Mean Squared Error (MSE) Cross-Entropy Loss Optimizer Stochastic Gradient Descent (SGD) Mini-batch Gradient Descent Building a neural network model Training process of neural network Forward propagation Backward propagation Stopping criteria Applying neural network for classification and regression problem Deciding the number of hidden layers and perceptrons Classification problem The heuristic approach of deciding the number of nodes in hidden layers Regression problem Conventions of building MLP (Multilayer Perceptron) model Different types of neural network Convolutional Neural Network (CNN) Convolution operation Significance of convolution in the neural network model Anatomy of a CNN Convolution layer Max/average-pool layer Feature engineering of images using CNN A brief introduction to PyTorch for designing a neural network CNN using PyTorch Input andoutput channel Auto-Encoder Disadvantages of deep learning and neural network Conclusion 6. Miscellaneous Unsupervised Learning Structure Objectives Dimensionality reduction Principal Component Analysis (PCA) Computation of PC from Covariance and Eigenvectors PCA and Co-relation coefficient Classification using principal components Regression using principal components Key points to remember about PCA Unsupervised outlier detection Outlier detection using Auto-Encoder Architecture of the Auto-Encoder for outlier detection Metric to measure outlier factor Training of Auto-Encoder Testing the result Key points to remember about Auto Encoder based outlier detection Outlier detection using clustering Center-based clustering algorithm for outlier detection Density-based clustering algorithm for outlier detection Testing the accuracy Key points to remember about DBSCAN based outlier detection Outlier detection using Isolation Forest Isolation Tree Isolation Forest Accuracy Key points to remember about Isolation Forest-based outlier detection Conclusion 7. Text Mining Structure Objectives Analyzing text What are a corpus and document? Pre-processing of text Steps of cleaning text Vector space models of text TF-IDF model Word2Vec model Skip-Gram Word2Vec model CBOW (Continuous Bag of Words) Word2Vec model Comparison of Skip-Gram and CBOW Doc2Vec model Average of Word2Vec Distributed Memory Model (PV-DM) of Doc2Vec Distributed Bag of Words of Paragraph Vectors Model (PV-DBOW) of Doc2Vec Comparison of different Doc2Vec models Comparisons of different vector space models Text classification techniques Visualization techniques for text Histogram Word Cloud Naïve Bayes Classifier for text TF-IDF with Naïve Bayes classifier Doc2Vec with Naïve Bayes classifier Measuring text similarity Text clustering Conclusion 8. Machine Learning Models in Production Structure Objectives Challenges of putting a model into production Exposing model as a service Save and load a model A brief introduction to Flask Exposing model as Flask REST API Adding scalability support Scalability for storage Scalability for computing Apache Spark-MLlib and pipeline Building platform-independent model descriptor Predictive Model Markup Language (PMML) Elements/tags of a PMML document Generation of PMML document from a model Installation of required libraries scikit-learn pipeline to PMML conversion How to use PMML document Python client for PMML model Java client for PMML model Building overall architecture Model deployed in batch mode Model deployed in ad-hoc/real-time mode Conclusion 9. Case Studies and Storytelling Structure Objectives What is data science storytelling? Machinelearning model Visualizations Facts Case study 1: Analysis of sales-profit for superstore sales data from tableau user group using multivariate regression techniques Data source and problem definition Data exploration Data filtering Data pre-processing Building the model Analysis of result and dimension-measure relationships Subcategory vs.profit analysis Quantity vs.profit analysis Postal code vs.profit analysis Sales vs.profit analysis Discount vs.profit analysis Product name vs.profit analysis Case study 2: Prediction of movie genres with multilabel text classification Data source and problem definition Data exploration Building the model Analysis of result and testing the model Case study 3: Classification of natural images using CNN and PyTorch Data source and problem definition Data exploration Effect of applying convolution filter Building the model Analysis of result and testing the model Conclusion
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.