Practical Mathematics for AI and Deep Learning: A Concise yet In-Depth Guide on Fundamentals of Computer Vision, NLP, Complex Deep Neural Networks and Machine Learning

by Shravan Kumar Belagal Math, Tamoghna Ghosh

Length: 755 pages
Edition: 1
Language: English
Publisher: BPB Publications
Publication Date: 2022
ISBN-10: B0BRCP4NX1
Sales Rank: #650371 (See Top 100 Books)

Mathematical Codebook to Navigate Through the Fast-changing AI Landscape

Key Features

Access to industry-recognized AI methodology and deep learning mathematics with simple-to-understand examples.
Encompasses MDP Modeling, the Bellman Equation, Auto-regressive Models, BERT, and Transformers.
Detailed, line-by-line diagrams of algorithms, and the mathematical computations they perform.

Description
To construct a system that may be referred to as having ‘Artificial Intelligence,’ it is important to develop the capacity to design algorithms capable of performing data-based automated decision-making in conditions of uncertainty. Now, to accomplish this goal, one needs to have an in-depth understanding of the more sophisticated components of linear algebra, vector calculus, probability, and statistics. This book walks you through every mathematical algorithm, as well as its architecture, its operation, and its design so that you can understand how any artificial intelligence system operates.

This book will teach you the common terminologies used in artificial intelligence such as models, data, parameters of models, and dependent and independent variables. The Bayesian linear regression, the Gaussian mixture model, the stochastic gradient descent, and the backpropagation algorithms are explored with implementation beginning from scratch. The vast majority of the sophisticated mathematics required for complicated AI computations such as autoregressive models, cycle GANs, and CNN optimization are explained and compared.

You will acquire knowledge that extends beyond mathematics while reading this book. Specifically, you will become familiar with numerous AI training methods, various NLP tasks, and the process of reducing the dimensionality of data.

What you will learn

Learn to think like a professional data scientist by picking the best-performing AI algorithms.
Expand your mathematical horizons to include the most cutting-edge AI methods.
Learn about Transformer Networks, improving CNN performance, dimensionality reduction, and generative models.
Explore several neural network designs as a starting point for constructing your own NLP and Computer Vision architecture.
Create specialized loss functions and tailor-made AI algorithms for a given business application.

Who this book is for

Everyone interested in artificial intelligence and its computational foundations, including machine learning, data science, deep learning, computer vision, and natural language processing (NLP), both researchers and professionals, will find this book to be an excellent companion. This book can be useful as a quick reference for practitioners who already use a variety of mathematical topics but do not completely understand the underlying principles.

Cover Page
Title Page
Copyright Page
Dedication Page
About the Authors
About the Reviewer
Acknowledgements
Preface
Errata
Table of Contents
1. Overview of AI
    Structure
    Objectives
    AI systems
    Machine Learning
        How are ML Models created?
    Data types
    Learning From data
    Types of ML algorithm
        Unsupervised learning
        Reinforcement learning
        Supervised learning
    Metrices for evaluating classification model
    Metrices for evaluating regression model
    Deep learning
    Dataset preparation
    Application of AI
    Role of Mathematics in AI
    Conclusion
2. Linear Algebra
    Structure
    Objectives
    Linear equations
        Solving system of equations analytically
        Infinitely many solutions
        Inconsistent system
    Introducing matrix
        Augmented matrix
        Pseudocode forward substitution
        Pseudocode back substitution
        Basic matrix operations
    Euclidean space
        Vectors and basic properties
            Representing vector
            Norm
            Direction
            Scalar multiplication
            Addition/subtraction of vectors
            Distance between vectors
        Dot product and orthogonality
        Linear Combination of Vectors
        Dimension and basis of the space
            Orthogonal and orthonormal basis
            Natural orthonormal basis of ℝn
            Subspaces
            Dimension of subspace
            Hyperplanes and Halfspaces
        Defining vector space
            Vector spaces
            Normed vector space
            Norm of real numbers
            lp Norm
            Maximum norm
            Matrix norm
            Inner product
        Application on real dataset
        K-nearest neighbor
    Representing vectors in matrix
        Matrix rank
    Matrices types
        Identity matrix
        Symmetric matrix
        Skew symmetric matrix
        Invertible matrices
            Properties of Matrix Inverse
        Permutation matrix
        Orthogonal matrix
    Matrices in ML problem formulation
        Feature/data matrix
        One hot encoding
        Distance matrix
        Gram matrix
        Covariance matrix
        Correlation matrix
        Jacobian and Hessian matrix
    Subspaces of matrix and orthogonality
        Null space
        Orthogonality among subspaces
        Determinant
            Inverse of Matrix
    Orthonormalization
        Applications of Orthonormalization
    Linear transformation
        Matrix associated with linear map
        Composition of linear transformation
        Eigenvalues and vectors
            Eigen properties
            Geometric analysis
            Existence of zero eigenvalue
            Eigen properties of symmetric matrices
            Positive definite
    Matrix decomposition
        LU decomposition
            By-product of Gauss-Jordan elimination
        QR decomposition
        Eigen decomposition
            Real symmetric matrix
            Singular value decomposition
    Conclusion
    Points to remember
    Further Reading
3. Vector Calculus
    Structure
    Objectives
    Analysis of real functions
        Limit of a function
        Continuous functions
        Derivative of a function
            Higher Order derivatives
        Taylor series expansion
    Scalar and vector fields
        Limits and continuity
        Derivative of scalar fields w.r.t. vector
            Directional derivative and partial derivatives
            Total derivative
            Geometry of gradient vector
        Derivative of vector fields w.r.t. vector
            Chain rule for derivatives of vector fields
            Matrix form of the chain rule
            Tensors
            Einstein notation
            Dot product of tensors
            Tensor calculus
            Total derivative of tensor
    Mathematical optimization
        Maxima, minima, and saddle point
        Decent methods
        Function optimization with constraints: Lagrange multipliers
        Optimization with inequality constraints
            The Lagrange dual function
            Convex functions
            Properties of convex functions
            Convex optimization
            Karush-Kuhn-Tucker conditions (KKT)
    Conclusion
    Points to remember
    Further readings
4. Basic Statistics and Probability Theory
    Structure
    Objectives
    Basic statistics
    Measures of central tendency
        Mean
        Median
        Mode
    Partition Values
    Measures of dispersion
        Range
        Interquartile Range
        Mean deviation
        Standard deviation
        Coefficients of dispersion
    Moments
    Skewness and kurtosis
    Correlation
    Probability and odds
    Random experiment
        Events as sets
        Conditional probability
        Independent Events
        Conditional independence
    Total probability theorem
    Bayes theorem
    Bayesian Decision Theory
    Random variable
    Discrete probability distributions
        Bernoulli and categorical distribution
        Binomial distribution
        Poisson distribution
    Continuous probability distributions
    Cumulative Probability Distribution Function (C.D.F)
    Uniform distribution
    Gaussian distribution or normal distribution
    Exponential Distribution
    Mathematical expectation of a random variable
    Joint Probability Distributions
    Transformation of a random variable
    Multivariate distributions
        Multinomial distribution
        Multivariate gaussian distribution
    Information theory
        Entropy
        Relative entropy or KL divergence
        Mutual information
    Decision tree
    Conclusion
    Points to remember
    Further reading
5. Statistical Inference and Applications
    Structure
    Objectives
    Large Sample Theory
        Sample statistics
        Sampling from known distributions
        Hypothesis testing
    Statistical inference
    Estimator properties
    Minimum Variance Unbiased (M.V.U) estimators
        Likelihood function
        Cramer-Rao inequality
        Method of Maximum Likelihood Estimation (MLE)
        Bias-variance decomposition of estimator
    Applications – Formulating ML problems as statistical inferencing
        Data distribution
        Classification
        Naive Bayes classifier
        Regression
        Linear and curvilinear regression
        Estimating model parameters
        Iterative estimation of model parameters
        Overfitting and underfitting
        Bias variance trade-off
        Logistic Regression
        Multiclass logistic regression
        Poisson regression
        Interpretability of linear models
    Conclusion
    Points to remember
    Further Reading
6. Neural Networks
    Structure
    Objectives
    Artificial neuron: An adaptive basis function
    Feed Forward neural network
    Training neural network
    Stochastic Gradient Descent
    Computing error derivatives
    Backpropagation algorithm
    Challenges of training neural networks
        Modifications of SGD
            Momentum methods
            Adaptive learning rate
        Bias-variance trade-off in neural networks
        Regularization of neural nets
    Sensitivity of neural networks to small perturbations
    Neural Network Architectures
    Conclusion
    Points to remember
    Further Reading
7. Clustering
    Structure
    Objectives
    Forming clusters
    Distance and similarity
    Cluster quality
    Internal evaluation
        Davies-Bouldin indicator
        Dunn indicator
        Silhouette coefficient
    External evaluation
        Rand index
        F-measure
        Fowlkes–Mallows index
        Jaccard index
    Clustering algorithms
        Partition-based clustering
            K-means
            K-medoids
        Density-based clustering
            DBSCAN
        Distribution-based clustering
            Gaussian Mixture Model
        Hierarchical-based clustering
        Agglomerative clustering
            Distance between clusters
            BIRCH
        Graph-based clustering
        Fuzzy theory-based clustering
            Fuzzy c-means
    Conclusion
    References
8. Dimensionality Reduction
    Structure
    Objectives
    Reducing dimensionality
    Principal Component Analysis
    Loading Iris dataset
    Calculating covariance matrix
    Decomposition of covariance matrix
    Reducing with principal components
    Variance retention
    When to use PCA
    Autoencoder
    Iris autoencoder
    t-SNE
    Choosing σi
    PCA vs t-SNE
    t-SNE on Iris Dataset
    Conclusion
    Further reading
    References
9. Computer Vision
    Structure
    Objectives
    Digital Image Formation
        Capture the light
        Sampling and quantization
    Pixels
        Accessing pixels
    Spatial filtering
        Geometric spatial transformation
        Neighbor pixel operation
            Convolution properties
        Separable kernels
            Convolution with separable kernels
        Gaussian kernel
            Discrete approximation of Gaussian function
            Application of Gaussian filter
        Image derivative-based kernels
            Laplacian kernel – Second order derivative
            Sobel kernel: First order derivative
        Non-linear filters
    Learning filters
    Convolution Neural Networks
        Convolution layer
        Pooling layer
        Spatially separable convolution
        Depthwise separable convolution
            Depthwise convolution
            Pointwise convolution
            Optimization
        Upsampling: Transposed convolution
    Development of CNN
        AlexNet
            TensorFlow Model
            Counting trainable parameters
        Inception
        VGG
        ResNet
        Xception
    Application of CNN models
    Image classification
        Object detection
            R-CNN – Regions with CNN features
            YOLO – You Only Look Once
        Image segmentation
            U-Net
    Summary
    Further reading
    Points to remember
    References
10. Sequence Learning Models
    Structure
    Objectives
    Time series models
        Decomposition of time series
        Differencing
        Time series forecasting
            OLS model
            Exponential smoothing
            Autoregressive Integrated Moving Average
    Probabilistic sequence models
        Markov chain
        Hidden Markov model
    Recurrent neural networks
        Training RNN
        Long Short-Term Memory (LSTM)
        Gated Recurrent Unit (GRU)
        Stacked LSTM/RNN
    Generative models for sequence
        Handwriting generation
            Mixture Density Network
    Sequence classification
        Bi-directional RNN
    Sequence to Sequence
        Connectionist Temporal Classification
            Training CTC network: Maximum likelihood
            DP formulation for CTC loss
            Inferencing from CTC network
        Encoder-Decoder architecture
        Attention mechanism
            Key-value-query formulation of attention
            Language translation model
            Speech recognition model
        Self-attention and transformers
            Computing self-attention
            Transformer architecture
    Conclusion
    Points to remember
    Further Reading
11. Natural Language Processing
    Structure
    Objectives
    Natural language
        Syntactic structure of language
            Parts of Speech (POS)
            Phrases
            Clause
            Sentence
            Document and Text corpus
        Semantic structure of language
            Wordnet
    Text preprocessing
    Models for text
        Bag of Words (BoW) model
        Vector Space Model
            Count based or Boolean
            Term Frequency (TF)-Inverted Document Frequency (IDF)
        Latent Semantic Indexing (LSI) model
        Probabilistic models of text
            Topic models
            Probabilistic generative models: Latent Dirichlet allocation
            Neural language models
        Contextual models
            ELMo model
            BERT
            Position encoding
            Pre-training BERT
            Input representation for pre-training tasks of BERT
            WordPiece tokenization
            ERNIE
            Generative Pre-Training by OpenAI
    Conclusion
    Points to remember
    Further reading
12. Generative Models
    Structure
    Objectives
    A simple generative model
        Variational Autoencoders (VAE)
        Generative Adversarial Nets
            Equilibrium state for GAN training
            Implementing GAN
            GAN training challenges
            Solutions for mitigating GAN training issues
        Wasserstein GAN (WGAN)
            Some properties of EM distance
            WGAN training
            Ensuring Lipschitz Constraint in Discriminator
        Conditional GAN (cGAN)
        Cycle GAN (CycleGAN)
    Autoregressive generative models
    Applying generative models
    Conclusion
    Points to remember
    Further Reading
Index

Computers & Technology
- Computer Science

AI & Machine Learning Computer Vision & Pattern Recognition Intelligence & Semantics Neural Networks

Practical Mathematics for AI and Deep Learning: A Concise yet In-Depth Guide on Fundamentals of Computer Vision, NLP, Complex Deep Neural Networks and Machine Learning

Mastering Large Language Models with Python: Unleash the Power of Advanced Natural Language Processing for Enterprise Innovation and Efficiency Using ... Models (LLMs) with Python (English Edition)

Modern Data Mining with Python: A risk-managed approach to developing and deploying explainable and efficient algorithms using ModelOps

Advanced Manufacturing and Supply Chain with IoT: Revolutionizing industries through smart technologies and connectivity (English Edition)

Learning PyTorch 2.0: Experiment deep learning from basics to complex models using every potential capability of Pythonic PyTorch

Modern Data Analytics in Excel: Using Power Query, Power Pivot, and More for Enhanced Data Analytics

Math and Architectures of Deep Learning