Deep Learning: A Visual Approach

Length: 776 pages
Edition: I
Language: English
Publisher: No Starch Press
Publication Date: 2021-06-29
ISBN-10: 1718500726
ISBN-13: 9781718500723
Sales Rank: #301397 (See Top 100 Books)

A richly-illustrated, full-color introduction to deep learning that offers visual and conceptual explanations instead of equations. You’ll learn how to use key deep learning algorithms without the need for complex math.

Ever since computers began beating us at chess, they’ve been getting better at a wide range of human activities, from writing songs and generating news articles to helping doctors provide healthcare.

Deep learning is the source of many of these breakthroughs, and its remarkable ability to find patterns hiding in data has made it the fastest growing field in artificial intelligence (AI). Digital assistants on our phones use deep learning to understand and respond intelligently to voice commands; automotive systems use it to safely navigate road hazards; online platforms use it to deliver personalized suggestions for movies and books – the possibilities are endless.

Deep Learning: A Visual Approach is for anyone who wants to understand this fascinating field in depth, but without any of the advanced math and programming usually required to grasp its internals. If you want to know how these tools work, and use them yourself, the answers are all within these pages. And, if you’re ready to write your own programs, there are also plenty of supplemental Python notebooks in the accompanying Github repository to get you going.

The book’s conversational style, extensive color illustrations, illuminating analogies, and real-world examples expertly explain the key concepts in deep learning, including:

How text generators create novel stories and articles
How deep learning systems learn to play and win at human games
How image classification systems identify objects or people in a photo
How to think about probabilities in a way that’s useful to everyday life
How to use the machine learning techniques that form the core of modern AI

Intellectual adventurers of all kinds can use the powerful ideas covered in Deep Learning: A Visual Approach to build intelligent systems that help us better understand the world and everyone who lives in it. It’s the future of AI, and this book allows you to fully envision it.

Cover
Title Page
Copyright
Dedication
About the Author
Acknowledgments
Introduction
    Who This Book Is For
    This Book Has No Complex Math and No Code
    There Is Code, If You Want It
    The Figures Are Available, Too!
    Errata
    About This Book
        Part I: Foundational Ideas
        Part II: Basic Machine Learning
        Part III: Deep Learning Basics
        Part IV: Beyond the Basics
    Final Words
Part I: Foundational Ideas
    Chapter 1: An Overview of Machine Learning
        Expert Systems
        Supervised Learning
        Unsupervised Learning
        Reinforcement Learning 
        Deep Learning
        Summary
    Chapter 2: Essential Statistics
        Describing Randomness 
        Random Variables and Probability Distributions 
        Some Common Distributions
            Continuous Distributions
            Discrete Distributions
        Collections of Random Values
            Expected Value
            Dependence
            Independent and Identically Distributed Variables
        Sampling and Replacement
            Selection with Replacement
            Selection Without Replacement
        Bootstrapping
        Covariance and Correlation
            Covariance
            Correlation
        Statistics Don’t Tell Us Everything 
        High-Dimensional Spaces
        Summary
    Chapter 3: Measuring Performance
        Different Types of Probability
            Dart Throwing
            Simple Probability
            Conditional Probability
            Joint Probability
            Marginal Probability
        Measuring Correctness
            Classifying Samples
            The Confusion Matrix
            Characterizing Incorrect Predictions
            Measuring Correct and Incorrect
            Accuracy
            Precision
            Recall
            Precision-Recall Tradeoff 
            Misleading Measures 
            f1 Score
            About These Terms
            Other Measures
        Constructing a Confusion Matrix Correctly
        Summary
    Chapter 4: Bayes’ Rule
        Frequentist and Bayesian Probability
            The Frequentist Approach
            The Bayesian Approach
            Frequentists vs. Bayesians
        Frequentist Coin Flipping
        Bayesian Coin Flipping
            A Motivating Example
            Picturing the Coin Probabilities
            Expressing Coin Flips as Probabilities
            Bayes’ Rule
            Discussion of Bayes’ Rule
        Bayes’ Rule and Confusion Matrices
        Repeating Bayes’ Rule
            The Posterior-Prior Loop
            The Bayes Loop in Action
        Multiple Hypotheses
        Summary
    Chapter 5: Curves and Surfaces
        The Nature of Functions
        The Derivative
            Maximums and Minimums
            Tangent Lines
            Finding Minimums and Maximums with Derivatives
        The Gradient
            Water, Gravity, and the Gradient
            Finding Maximums and Minimums with Gradients
            Saddle Points
        Summary
    Chapter 6: Information Theory
        Surprise and Context
            Understanding Surprise
            Unpacking Context
        Measuring Information
        Adaptive Codes
            Speaking Morse 
            Customizing Morse Code
        Entropy
        Cross Entropy
            Two Adaptive Codes
            Using the Codes
            Cross Entropy in Practice
        Kullback–Leibler Divergence
        Summary
Part II: Basic Machine Learning
    Chapter 7: Classification
        Two-Dimensional Binary Classification
        2D Multiclass Classification
        Multiclass Classification 
            One-Versus-Rest
            One-Versus-One
        Clustering
        The Curse of Dimensionality
            Dimensionality and Density
            High-Dimensional Weirdness
        Summary
    Chapter 8: Training and Testing
        Training
        Testing the Performance
            Test Data
            Validation Data
        Cross-Validation
        k-Fold Cross-Validation
        Summary
    Chapter 9: Overfitting and Underfitting
        Finding a Good Fit
            Overfitting
            Underfitting
        Detecting and Addressing Overfitting
            Early Stopping
            Regularization
        Bias and Variance
            Matching the Underlying Data
            High Bias, Low Variance
            Low Bias, High Variance
            Comparing Curves
        Fitting a Line with Bayes’ Rule
        Summary
    Chapter 10: Data Preparation
        Basic Data Cleaning
        The Importance of Consistency
        Types of Data
        One-Hot Encoding
        Normalizing and Standardizing
            Normalization
            Standardization
            Remembering the Transformation
        Types of Transformations
            Slice Processing
            Samplewise Processing
            Featurewise Processing
            Elementwise Processing
        Inverse Transformations
        Information Leakage in Cross-Validation
        Shrinking the Dataset
            Feature Selection
            Dimensionality Reduction
        Principal Component Analysis
            PCA for Simple Images
            PCA for Real Images
        Summary
    Chapter 11: Classifiers
         Types of Classifiers
        k-Nearest Neighbors
        Decision Trees
            Using Decision Trees
            Overfitting Trees
            Splitting Nodes
        Support Vector Machines
            The Basic Algorithm
            The SVM Kernel Trick 
        Naive Bayes
        Comparing Classifiers
        Summary
    Chapter 12: Ensembles
        Voting
        Ensembles of Decision Trees
            Bagging
            Random Forests
            Extra Trees
        Boosting
        Summary
Part III: Deep Learning Basics
    Chapter 13: Neural Networks
        Real Neurons
        Artificial Neurons
            The Perceptron
            Modern Artificial Neurons
        Drawing the Neurons
        Feed-Forward Networks
        Neural Network Graphs
        Initializing the Weights
        Deep Networks
        Fully Connected Layers
        Tensors
        Preventing Network Collapse 
        Activation Functions
            Straight-Line Functions
            Step Functions
            Piecewise Linear Functions
            Smooth Functions
            Activation Function Gallery
            Comparing Activation Functions
        Softmax
        Summary
    Chapter 14: Backpropagation
        A High-Level Overview of Training 
            Punishing Error
            A Slow Way to Learn
            Gradient Descent
        Getting Started
        Backprop on a Tiny Neural Network
            Finding Deltas for the Output Neurons
            Using Deltas to Change Weights
            Other Neuron Deltas
        Backprop on a Larger Network
        The Learning Rate
            Building a Binary Classifier
            Picking a Learning Rate
            An Even Smaller Learning Rate
        Summary 
    Chapter 15: Optimizers
        Error as a 2D Curve
        Adjusting the Learning Rate
            Constant-Sized Updates
            Changing the Learning Rate over Time
            Decay Schedules
        Updating Strategies
            Batch Gradient Descent
            Stochastic Gradient Descent 
            Mini-Batch Gradient Descent
        Gradient Descent Variations
            Momentum
            Nesterov Momentum
            Adagrad
            Adadelta and RMSprop
            Adam
        Choosing an Optimizer
        Regularization
            Dropout
            Batchnorm
        Summary
PART IV: Beyond the Basics
    Chapter 16: Convolutional Neural Networks
        Introducing Convolution
            Detecting Yellow
            Weight Sharing
            Larger Filters
            Filters and Features
            Padding
        Multidimensional Convolution
        Multiple Filters
        Convolution Layers
            1D Convolution
            1×1 Convolutions 
        Changing Output Size 
            Pooling
            Striding
            Transposed Convolution
        Hierarchies of Filters
            Simplifying Assumptions
            Finding Face Masks
            Finding Eyes, Noses, and Mouths
            Applying Our Filters
        Summary
    Chapter 17: Convnets in Practice
        Categorizing Handwritten Digits
        VGG16
        Visualizing Filters, Part 1
        Visualizing Filters, Part 2
        Adversaries
        Summary
    Chapter 18: Autoencoders
        Introduction to Encoding
            Lossless and Lossy Encoding
        Blending Representations
        The Simplest Autoencoder
        A Better Autoencoder
        Exploring the Autoencoder
            A Closer Look at the Latent Variables
            The Parameter Space
            Blending Latent Variables
            Predicting from Novel Input
        Convolutional Autoencoders
            Blending Latent Variables
            Predicting from Novel Input
        Denoising
        Variational Autoencoders
            Distribution of Latent Variables
            Variational Autoencoder Structure
        Exploring the VAE
            Working with the MNIST Samples
            Working with Two Latent Variables
            Producing New Input
        Summary
    Chapter 19: Recurrent Neural Networks
        Working with Language
            Common Natural Language Processing Tasks
            Transforming Text into Numbers
            Fine-Tuning and Downstream Networks
        Fully Connected Prediction
            Testing Our Network
            Why Our Network Failed
        Recurrent Neural Networks
            Introducing State
            Rolling Up Our Diagram
            Recurrent Cells in Action
            Training a Recurrent Neural Network
            Long Short-Term Memory and Gated Recurrent Networks
        Using Recurrent Neural Networks
            Working with Sunspot Data
            Generating Text
            Different Architectures
        Seq2Seq
        Summary
    Chapter 20: Attention and Transformers
        Embedding
            Embedding Words
            ELMo
        Attention
            A Motivating Analogy
            Self-Attention
            Q/KV Attention
            Multi-Head Attention
            Layer Icons
        Transformers
            Skip Connections
            Norm-Add
            Positional Encoding
            Assembling a Transformer
            Transformers in Action
        BERT and GPT-2
            BERT
            GPT-2
            Generators Discussion
            Data Poisoning
        Summary
    Chapter 21: Reinforcement Learning
        Basic Ideas
        Learning a New Game
        The Structure of Reinforcement Learning
            Step 1: The Agent Selects an Action
            Step 2: The Environment Responds
            Step 3: The Agent Updates Itself
            Back to the Big Picture
            Understanding Rewards
        Flippers
        L-Learning
            The Basics
            The L-Learning Algorithm
            Testing Our Algorithm
            Handling Unpredictability
        Q-Learning
            Q-Values and Updates
            Q-Learning Policy
            Putting It All Together
            The Elephant in the Room
            Q-learning in Action
        SARSA
            The Algorithm
            SARSA in Action
            Comparing Q-Learning and SARSA
        The Big Picture
        Summary
    Chapter 22: Generative Adversarial Networks
        Forging Money
            Learning from Experience
            Forging with Neural Networks
            A Learning Round
            Why Adversarial?
        Implementing GANs
            The Discriminator
            The Generator
            Training the GAN
        GANs in Action
            Building a Discriminator and Generator 
            Training Our Network
            Testing Our Network
        DCGANs
        Challenges
            Using Big Samples
            Modal Collapse
            Training with Generated Data
        Summary
    Chapter 23: Creative Applications
        Deep Dreaming
            Stimulating Filters
            Running Deep Dreaming
        Neural Style Transfer
            Representing Style
            Representing Content
            Style and Content Together
            Running Style Transfer
        Generating More of This Book
        Summary
        Final Thoughts
References
    Chapter 1
    Chapter 2
    Chapter 3
    Chapter 4
    Chapter 5
    Chapter 6
    Chapter 7
    Chapter 8
    Chapter 9
    Chapter 10
    Chapter 11
    Chapter 12
    Chapter 13
    Chapter 14
    Chapter 15
    Chapter 16
    Chapter 17
    Chapter 18
    Chapter 19
    Chapter 20
    Chapter 21
    Chapter 22
    Chapter 23
Image Credits 
    Chapter 1
    Chapter 10
    Chapter 16
    Chapter 17
    Chapter 18
    Chapter 23
Index