Practical Deep Reinforcement Learning with Python: Concise Implementation of Algorithms, Simplified Maths, and Effective Use of TensorFlow and PyTorch

Length: 398 pages
Edition: 1
Language: English
Publisher: BPB Publications
Publication Date: 2022-07-15
ISBN-10: 9355512058
ISBN-13: 9789355512055
Sales Rank: #2448438 (See Top 100 Books)

Introducing Practical Smart Agents Development using Python, PyTorch, and TensorFlow

Key Features

Exposure to well-known RL techniques, including Monte-Carlo, Deep Q-Learning, Policy Gradient, and Actor-Critical.
Hands-on experience with TensorFlow and PyTorch on Reinforcement Learning projects.
Everything is concise, up-to-date, and visually explained with simplified mathematics.

Description

Reinforcement learning is a fascinating branch of AI that differs from standard machine learning in several ways. Adaptation and learning in an unpredictable environment is the part of this project. There are numerous real-world applications for reinforcement learning these days, including medical, gambling, human imitation activity, and robotics.

This book introduces readers to reinforcement learning from a pragmatic point of view. The book does involve mathematics, but it does not attempt to overburden the reader, who is a beginner in the field of reinforcement learning.

The book brings a lot of innovative methods to the reader’s attention in much practical learning, including Monte-Carlo, Deep Q-Learning, Policy Gradient, and Actor-Critical methods. While you understand these techniques in detail, the book also provides a real implementation of these methods and techniques using the power of TensorFlow and PyTorch. The book covers some enticing projects that show the power of reinforcement learning, and not to mention that everything is concise, up-to-date, and visually explained.

After finishing this book, the reader will have a thorough, intuitive understanding of modern reinforcement learning and its applications, which will tremendously aid them in delving into the interesting field of reinforcement learning.

What you will learn

Familiarize yourself with the fundamentals of Reinforcement Learning and Deep Reinforcement Learning.
Make use of Python and Gym framework to model an external environment.
Apply classical Q-learning, Monte Carlo, Policy Gradient, and Thompson sampling techniques.
Explore TensorFlow and PyTorch to practice the fundamentals of deep reinforcement learning.
Design a smart agent for a particular problem using a specific technique.

Who this book is for

This book is for machine learning engineers, deep learning fanatics, AI software developers, data scientists, and other data professionals eager to learn and apply Reinforcement Learning to ongoing projects. No specialized knowledge of machine learning is necessary; however, proficiency in Python is desired.

Cover Page
Title Page
Copyright Page
Dedication Page
About the Author
About the Reviewer
Acknowledgement
Preface
Errata
Table of Contents
Part I
    1. Introducing Reinforcement Learning
        Structure
        Objectives
        What is reinforcement learning?
        Reinforcement learning mechanics
        Reinforcement learning vs. supervised learning
        Examples of reinforcement learning
            Stock trading
            Chess
            Neural Architecture Search (NAS)
        Conclusion
        Points to remember
        Multiple choice questions
            Answers
        Key terms
    2. Playing Monopoly and Markov Decision Process
        Structure
        Objectives
        Choosing the best strategy for playing Monopoly
            List of rules
        Markov chain
        Markov reward process
        Markov decision process
            State
            Probability
            Reward
            Actions
        Policy
        Blackjack
        Stock trading
        Video games
        Monopoly as Markov decision process
        Conclusion
        Points to remember
        Multiple choice questions
            Answers
        Key terms
    3. Training in Gym
        Structure
        Objectives
        Why do we need Gym?
        Installation
        CartPole environment
            Result
            Result
        Interacting with Gym
            List of environments
                Environment initialization
                Reproducible script
                Action space
                Result
                Reset environment
                Render environment
                Send action to environment
                Close environment
        Gym environments
            Lunar Lander
            Mountain car
            Phoenix
        Custom environment
            Initialization
            Step
            Reset
            Render
        Custom Environment with PyGame
        Conclusion
        Points to Remember
        Multiple choice questions
            Answers
    4. Struggling with Multi-Armed Bandits
        Structure
        Objectives
        Gambling with Multi-Armed Bandits
            Online advertising
            Clinical trials
        Emulating Multi-Armed Bandits in Gym
            Result
        Epsilon Greedy Policy
            Result
            Result
        Thomson sampling policy
            Visualization
            Result
            Result
        Epsilon greedy versus Thompson sampling
            Result
        Exploration versus exploitation
        Conclusion
        Points to remember
        Multiple choice questions
            Answers
        Key terms
    5. Blackjack in Monte Carlo
        Structure
        Objectives
        Blackjack as a Reinforcement Learning Problem
            Result
        Q(s,a) – action-value function
        Monte Carlo method
            Results
            Results
            Results
        Monte Carlo Policy Exploration and Greedy Policy Exploitation
            Result:
        Optimal policy for unbalanced Blackjack
            Result
        Conclusion
        Points to remember
        Multiple choice questions
            Answers
        Key terms
    6. Escaping Maze with Q-Learning
        Structure
        Objectives
        Maze
        Q-learning
        Solving maze problem
        Q-learning vs. Monte Carlo method
        Dense vs. Sparse rewards
        Conclusion
        Points to remember
        Multiple choice questions
            Answers
        Key terms
    7. Discretization
        Structure
        Objectives
        Discretization of continuous variables
        Discretization of Mountain Car state space
        Decayed epsilon greedy policy
        Discrete Q-learning agent
        Applying discrete Q-learning agent to Mountain Car problem
        Training
        Testing
        Running live
        Coarse versus Fine discretization
        Q-learning Alpha parameter
        Hyperparameters in reinforcement learning
        From limits of discretization to deep reinforcement learning
        Conclusion
        Points to remember
        Multiple choice questions
            Answers
        Key terms
Part II: Deep Reinforcement Learning
    8. TensorFlow, PyTorch, and Your First Neural Network
        Structure
        Objectives
        Installation
        Derivative calculators
        Deep learning basics
            Tensors
                Tensor creation
                Random tensor
                Reproducibility
                Common tensor types
                Tensor methods and attributes
            Math functions
        Deep learning layers
            Linear layer
            Convolution
            Pooling
            Dropout
            Flatten
            Activations
                ReLU
                Sigmoid
                Tanh
                Softmax
        Neural Network Architecture
        Supervised learning and loss function
            Classification
                Regression
                Loss function
        Training and optimizer
            Optimizers
            Epoch and batch size
        Handwritten digit recognition Model
        Conclusion
        Points to remember
        Multiple choice questions
            Answers
        Key terms
    9. Deep Q-Network and Lunar Lander
        Structure
        Objectives
        Neural networks in reinforcement learning
        Convergence of temporal difference and DQN training loss function
        Replay buffer
        DQN implementation
        Lunar Landing using DQN agent
            States
            Actions
            Environment
            DQN application
        Conclusion
        Points to remember
        Multiple choice questions
            Answers
        Key terms
    10. Defending Atlantis with Double Deep Q-Network
        Structure
        Objectives
        Atlantis gameplay
        Atlantis environment
        Capturing motion
        Convolution Q-Network
        Double Deep Q-Network
        Defending Atlantis using DDQN
        Conclusion
        Points to remember
        Multiple choice questions
            Answers
        Key terms
    11. From Q-Learning to Policy-Gradient
        Structure
        Objectives
        Stochastic Policy
        Stochastic Policy vs Deterministic Policy
        Parametric policy
        Neural network as Parametric Stochastic Policy
        Policy Gradient method
        Policy Gradient implementation
        Solving CartPole problem
        Conclusion
        Points to remember
        Multiple choice questions
            Answers
        Key terms
    12. Stock Trading with Actor-Critic
        Structure
        Objectives
        Policy gradient training drawbacks
        Actor-Critic theory
        A2C implementation
        A2C vs. policy gradient
        Stock Trading
            Problem
            Environment
            Solution
        Conclusion
        Points to remember
        Multiple choice questions
            Answer
        Key terms
    13. What Is Next?
        Structure
        Objectives
        Reinforcement learning overview
        Reread
        Deep learning
        Practice
        Conclusion
Index