Deep Learning with PyTorch Step-by-Step: A Beginner’s Guide
- Length: 1187 pages
- Edition: 1
- Language: English
- Publisher: Leanpub
- Publication Date: 2021-05-18
If you’re looking for a book where you can learn about Deep Learning and PyTorch without having to spend hours deciphering cryptic text and code, and that’s easy and enjoyable to read, this is it 🙂
The book covers from the basics of gradient descent all the way up to fine-tuning large NLP models (BERT and GPT-2) using HuggingFace. It is divided into four parts:
- Part I: Fundamentals (gradient descent, training linear and logistic regressions in PyTorch)
- Part II: Computer Vision (deeper models and activation functions, convolutions, transfer learning, initialization schemes)
- Part III: Sequences (RNN, GRU, LSTM, seq2seq models, attention, self-attention, transformers)
- Part IV: Natural Language Processing (tokenization, embeddings, contextual word embeddings, ELMo, BERT, GPT-2)
This is not a typical book: most tutorials start with some nice and pretty image classification problem to illustrate how to use PyTorch. It may seem cool, but I believe it distracts you from the main goal: how PyTorch works? In this book, I present a structured, incremental, and from first principles approach to learn PyTorch (and get to the pretty image classification problem in due time).
Moreover, this is not a formal book in any way: I am writing this book as if I were having a conversation with you, the reader. I will ask you questions (and give you answers shortly afterward) and I will also make (silly) jokes.
My job here is to make you understand the topic, so I will avoid fancy mathematical notation as much as possible and spell it out in plain English.
In this book, I will guide you through the development of many models in PyTorch, showing you why PyTorch makes it much easier and more intuitive to build models in Python: autograd, dynamic computation graph, model classes and much, much more.
We will build, step-by-step, not only the models themselves but also your understanding as I show you both the reasoning behind the code and how to avoid some common pitfalls and errors along the way.
I wrote this book for beginners in general – not only PyTorch beginners. Every now and then I will spend some time explaining some fundamental concepts which I believe are key to have a proper understanding of what’s going on in the code.
Maybe you already know well some of those concepts: if this is the case, you can simply skip them, since I’ve made those explanations as independent as possible from the rest of the content.
Preface Acknowledgements About the Author Frequently Asked Questions (FAQ) Why PyTorch? Why this book? Who should read this book? What do I need to know? How to read this book? What’s Next? Setup Guide Official Repository Environment Google Colab Binder Local Installation 1. Anaconda 2. Conda (Virtual) Environments 3. PyTorch Using GPU / CUDA Using CPU 4. TensorBoard 5. GraphViz and Torchviz (optional) 6. Git 7. Jupyter Moving On Part I: Fundamentals Chapter 0: Visualizing Gradient Descent Spoilers Jupyter Notebook Imports Visualizing Gradient Descent Model Data Generation Synthetic Data Generation Train-Validation-Test Split Step 0 - Random Initialization Step 1 - Compute Model’s Predictions Step 2 - Compute the Loss Loss Surface Cross-Sections Step 3 - Compute the Gradients Visualizing Gradients Backpropagation Step 4 - Update the Parameters Learning Rate Small Learning Rate Big Learning Rate Very Big Learning Rate "Bad" Feature Scaling / Standardizing / Normalizing Step 5 - Rinse and Repeat! The Path of Gradient Descent Recap Chapter 1: A Simple Regression Problem Spoilers Jupyter Notebook Imports A Simple Regression Problem Data Generation Synthetic Data Generation Gradient Descent Step 0 - Random Initialization Step 1 - Compute Model’s Predictions Step 2 - Compute the Loss Step 3 - Compute the Gradients Step 4 - Update the Parameters Step 5 - Rinse and Repeat! Linear Regression in Numpy PyTorch Tensor Loading Data, Devices and CUDA Creating Parameters Autograd backward grad zero_ Updating Parameters no_grad Dynamic Computation Graph Optimizer step / zero_grad Loss Model Parameters state_dict Device Forward Pass train Nested Models Sequential Models Layers Putting It All Together Data Preparation Model Configuration Model Training Recap Chapter 2: Rethinking the Training Loop Spoilers Jupyter Notebook Imports Rethinking the Training Loop Training Step Dataset TensorDataset DataLoader Mini-Batch Inner Loop Random Split Evaluation Plotting Losses TensorBoard Running It Inside a Notebook Running It Separately (Local Installation) Running It Separately (Binder) SummaryWriter add_graph add_scalars Saving and Loading Models Model State Saving Resuming Training Deploying / Making Predictions Setting the Model’s Mode Putting It All Together Recap Chapter 2.1: Going Classy Spoilers Jupyter Notebook Imports Going Classy The Class The Constructor Arguments Placeholders Variables Functions Training Methods Saving and Loading Methods Visualization Methods The Full Code Classy Pipeline Model Training Making Predictions Checkpointing Resuming Training Putting It All Together Recap Chapter 3: A Simple Classification Problem Spoilers Jupyter Notebook Imports A Simple Classification Problem Data Generation Data Preparation Model Logits Probabilities Odds Ratio Log Odds Ratio From Logits to Probabilities Sigmoid Logistic Regression Loss BCELoss BCEWithLogitsLoss Imbalanced Dataset Model Configuration Model Training Decision Boundary Classification Threshold Confusion Matrix Metrics True and False Positive Rates Precision and Recall Accuracy Trade-offs and Curves Low Threshold High Threshold ROC and PR Curves The Precision Quirk Best and Worst Curves Comparing Models Further Reading Putting It All Together Recap Part II: Computer Vision Chapter 4: Classifying Images Spoilers Jupyter Notebook Imports Classifying Images Data Generation NCHW vs NHWC Torchvision Datasets Models Transforms Transforms on Images Transforms on Tensor Normalize Transform Composing Transforms Data Preparation Dataset Transforms SubsetRandomSampler Data Augmentation Transforms WeightedRandomSampler Seeds and more (seeds) Putting It Together Pixels as Features Shallow Model Notation Model Configuration Model Training Deep-ish Model Model Configuration Model Training Show Me the Math! Show Me the Code! Weights as Pixels Activation Functions Sigmoid Hyperbolic Tangent (TanH) Rectified Linear Unit (ReLU) Leaky ReLU Parametric ReLU (PReLU) Deep Model Model Configuration Model Training Show Me the Math Again! Putting It All Together Recap Bonus Chapter: Feature Space Two-Dimensional Feature Space Transformations A Two-Dimensional Model Decision Boundary, Activation Style! More Functions, More Boundaries More Layers, More Boundaries More Dimensions, More Boundaries Recap Chapter 5: Convolutions Spoilers Jupyter Notebook Imports Convolutions Filter / Kernel Convolving Moving Around Shape Convolving in PyTorch Striding Padding A REAL Filter Pooling Flattening Dimensions Typical Architecture LeNet-5 A Multiclass Classification Problem Data Generation Data Preparation Loss Logits Softmax LogSoftmax Negative Log-Likelihood Loss Cross-Entropy Loss Classification Losses Showdown! Model Configuration Model Training Visualizing Filters and More! Visualizing Filters Hooks Visualizing Feature Maps Visualizing Classifier Layers Accuracy Loader Apply Putting It All Together Recap Chapter 6: Rock, Paper, Scissors Spoilers Jupyter Notebook Imports Rock, Paper, Scissors… Rock Paper Scissors Dataset Data Preparation ImageFolder Standardization The Real Datasets Three-Channel Convolutions Fancier Model Dropout Two-Dimensional Dropout Model Configuration Optimizer Learning Rate Model Training Accuracy Regularizing Effect Visualizing Filters Learning Rates Finding LR Adaptive Learning Rate Moving Average (MA) EWMA EWMA Meets Gradients Adam Visualizing Adapted Gradients Stochastic Gradient Descent (SGD) Momentum Nesterov Flavors of SGD Learning Rate Schedulers Epoch Schedulers Validation Loss Scheduler Schedulers in StepByStep - Part I Mini-Batch Schedulers Schedulers in StepByStep - Part II Scheduler Paths Adaptive vs Cycling Putting It All Together Recap Chapter 7: Transfer Learning Spoilers Jupyter Notebook Imports Transfer Learning ImageNet ImageNet Large Scale Visual Recognition Challenge (ILSVRC) ILSVRC-2012 AlexNet (SuperVision Team) ILSVRC-2014 VGG Inception (GoogLeNet Team) ILSVRC-2015 ResNet (MSRA Team) Comparing Architectures Transfer Learning in Practice Pre-Trained Model Adaptive Pooling Loading Weights Model Freezing Top of the Model Model Configuration Data Preparation Model Training Generating a Dataset of Features Top Model Auxiliary Classifiers (Side-Heads) 1x1 Convolutions Inception Modules Batch Normalization Running Statistics Evaluation Phase Momentum BatchNorm2d Other Normalizations Small Summary Residual Connections Learning the Identity The Power of Shortcuts Residual Blocks Putting It All Together Fine-Tuning Feature Extraction Recap Extra Chapter: Vanishing and Exploding Gradients Spoilers Jupyter Notebook Imports Vanishing and Exploding Gradients Vanishing Gradients Ball Dataset and Block Model Weights, Activations, and Gradients Initialization Schemes Batch Normalization Exploding Gradients Data Generation & Preparation Model Configuration & Training Gradient Clipping Value Clipping Norm Clipping (or Gradient Scaling) Model Configuration & Training Clipping with Hooks Recap Part III: Sequences Chapter 8: Sequences Spoilers Jupyter Notebook Imports Sequences Data Generation Recurrent Neural Networks (RNNs) RNN Cell RNN Layer Shapes Stacked RNN Bidirectional RNN Square Model Data Generation Data Preparation Model Configuration Model Training Visualizing the Model Transformed Inputs Hidden States The Journey of a Hidden State Can We Do Better? Gated Recurrent Units (GRUs) GRU Cell GRU Layer Square Model II - The Quickening Model Configuration & Training Visualizing the Model Hidden States The Journey of a Gated Hidden State Can We Do Better? Long Short-Term Memory (LSTM) LSTM Cell LSTM Layer Square Model III - The Sorcerer Model Configuration & Training Visualizing the Hidden States Variable-Length Sequences Padding Packing Unpacking (to padded) Packing (from padded) Variable-Length Dataset Data Preparation Collate Function Square Model IV - Packed Model Configuration & Training 1D Convolutions Shapes Multiple Features or Channels Dilation Data Preparation Model Configuration & Training Visualizing the Model Putting It All Together Fixed-Length Dataset Variable-Length Dataset There Can Be Only ONE… Model Model Configuration & Training Recap Chapter 9 - Part I: Sequence-To-Sequence Spoilers Jupyter Notebook Imports Sequence-To-Sequence Data Generation Encoder-Decoder Architecture Encoder Decoder Teacher Forcing Encoder + Decoder Data Preparation Model Configuration & Training Visualizing Predictions Can We Do Better? Attention "Values" "Keys" and "Queries" Computing the Context Vector Scoring Method Attention Scores Scaled Dot Product Attention Mechanism Source Mask Decoder Encoder + Decoder + Attention Model Configuration & Training Visualizing Predictions Visualizing Attention Multi-Headed Attention Chapter 9 - Part II: Sequence-To-Sequence Spoilers Self-Attention Encoder Cross-Attention Decoder Subsequent Inputs and Teacher Forcing Attention Scores Target Mask (Training) Target Mask (Evaluation/Prediction) Encoder + Decoder + Self-Attention Model Configuration & Training Visualizing Predictions Sequential No More Positional Encoding (PE) Encoder + Decoder + PE Model Configuration & Training Visualizing Predictions Visualizing Attention Putting It All Together Data Preparation Model Assembly Encoder + Decoder + Positional Encoding Self-Attention "Layers" Attention Heads Model Configuration & Training Recap Chapter 10: Transform and Roll Out Spoilers Jupyter Notebook Imports Transform and Roll Out Narrow Attention Chunking Multi-Headed Attention Stacking Encoders and Decoders Wrapping "Sub-Layers" Transformer Encoder Transformer Decoder Layer Normalization Batch vs Layer Our Seq2Seq Problem Projections or Embeddings The Transformer Data Preparation Model Configuration & Training Visualizing Predictions The PyTorch Transformer Model Configuration & Training Visualizing Predictions Vision Transformer Data Generation & Preparation Patches Rearranging Embeddings Special Classifier Token The Model Model Configuration & Training Putting It All Together Data Preparation Model Assembly 1. Encoder-Decoder 2. Encoder 3. Decoder 4. Positional Encoding 5. Encoder "Layer" 6. Decoder "Layer" 7. "Sub-Layer" Wrapper 8. Multi-Headed Attention Model Configuration & Training Recap Part IV: Natural Language Processing Chapter 11: Down the Yellow Brick Rabbit Hole Spoilers Jupyter Notebook Additional Setup Imports "Down the Yellow Brick Rabbit Hole" Building a Dataset Sentence Tokenization HuggingFace’s Dataset Loading a Dataset Attributes Methods Word Tokenization Vocabulary HuggingFace’s Tokenizer Before Word Embeddings One-Hot Encoding (OHE) Bag-of-Words (BoW) Language Models N-grams Continuous Bag-of-Words (CBoW) Word Embeddings Word2Vec What Is an Embedding Anyway? Pretrained Word2Vec Global Vectors (GloVe) Using Word Embeddings Vocabulary Coverage Tokenizer Special Tokens' Embeddings Model I - GloVE + Classifier Data Preparation Pretrained PyTorch Embeddings Model Configuration & Training Model II - GloVe + Transformer Visualizing Attention Contextual Word Embeddings ELMo BERT Document Embeddings Model III - Preprocessed Embeddings Data Preparation Model Configuration & Training BERT Tokenization Input Embeddings Pretraining Tasks Masked Language Model (MLM) Next Sentence Prediction (NSP) Outputs Model IV - Classifying using BERT Data Preparation Model Configuration & Training Fine-Tuning with HuggingFace Sequence Classification (or Regression) Tokenized Dataset Trainer Predictions Pipelines More Pipelines GPT-2 Putting It All Together Data Preparation "Packed" Dataset Model Configuration & Training Generating Text Recap Thank You!
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.