Foundations of deep reinforcement learning: theory and practice in Python
The Contemporary Introduction to Deep Reinforcement Learning that Combines Theory and Practice Deep reinforcement learning (deep RL) combines deep learning and reinforcement learning, in which artificial agents learn to solve sequential decision-making problems. In the past decade deep RL has achiev...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Boston
Addison-Wesley
2021
|
Schriftenreihe: | Addison Wesley data & analytics series
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis Klappentext |
Zusammenfassung: | The Contemporary Introduction to Deep Reinforcement Learning that Combines Theory and Practice Deep reinforcement learning (deep RL) combines deep learning and reinforcement learning, in which artificial agents learn to solve sequential decision-making problems. In the past decade deep RL has achieved remarkable results on a range of problems, from single and multiplayer games--such as Go, Atari games, and DotA 2--to robotics. Foundations of Deep Reinforcement Learning is an introduction to deep RL that uniquely combines both theory and implementation. It starts with intuition, then carefully explains the theory of deep RL algorithms, discusses implementations in its companion software library SLM Lab, and finishes with the practical details of getting deep RL to work. Understand each key aspect of a deep RL problem Explore policy- and value-based algorithms, including REINFORCE, SARSA, DQN, Double DQN, and Prioritized Experience Replay (PER) Delve into combined algorithms, including Actor-Critic and Proximal Policy Optimization (PPO) Understand how algorithms can be parallelized synchronously and asynchronously Run algorithms in SLM Lab and learn the practical implementation details for getting deep RL to work Explore algorithm benchmark results with tuned hyperparameters Understand how deep RL environments are designed This guide is ideal for both computer science students and software engineers who are familiar with basic machine learning concepts and have a working understanding of Python. Register your book for convenient access to downloads, updates, and/or corrections as they become available. See inside book for details |
Beschreibung: | Literaturverzeichnis Seite 353-362 |
Beschreibung: | xxvii, 379 Seiten Illustrationen, Diagramme 23 cm |
ISBN: | 9780135172384 0135172381 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV046858587 | ||
003 | DE-604 | ||
005 | 20220330 | ||
007 | t | ||
008 | 200819s2021 xxua||| |||| 00||| eng d | ||
020 | |a 9780135172384 |c paperback |9 978-0-13-517238-4 | ||
020 | |a 0135172381 |c paperback |9 0-13-517238-1 | ||
035 | |a (OCoLC)1191892931 | ||
035 | |a (DE-599)KXP1700578286 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
044 | |a xxu |c XD-US | ||
049 | |a DE-573 |a DE-83 |a DE-29T |a DE-384 |a DE-355 |a DE-858 |a DE-863 |a DE-898 | ||
050 | 0 | |a Q325.6 | |
082 | 0 | |a 006.31 | |
084 | |a ST 300 |0 (DE-625)143650: |2 rvk | ||
084 | |a ST 302 |0 (DE-625)143652: |2 rvk | ||
084 | |a ST 250 |0 (DE-625)143626: |2 rvk | ||
100 | 1 | |a Graesser, Laura |e Verfasser |0 (DE-588)1216177937 |4 aut | |
245 | 1 | 0 | |a Foundations of deep reinforcement learning |b theory and practice in Python |c Laura Graesser, Wah Loon Keng |
264 | 1 | |a Boston |b Addison-Wesley |c 2021 | |
264 | 4 | |c © 2020 | |
300 | |a xxvii, 379 Seiten |b Illustrationen, Diagramme |c 23 cm | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a Addison Wesley data & analytics series | |
500 | |a Literaturverzeichnis Seite 353-362 | ||
520 | 3 | |a The Contemporary Introduction to Deep Reinforcement Learning that Combines Theory and Practice Deep reinforcement learning (deep RL) combines deep learning and reinforcement learning, in which artificial agents learn to solve sequential decision-making problems. In the past decade deep RL has achieved remarkable results on a range of problems, from single and multiplayer games--such as Go, Atari games, and DotA 2--to robotics. Foundations of Deep Reinforcement Learning is an introduction to deep RL that uniquely combines both theory and implementation. It starts with intuition, then carefully explains the theory of deep RL algorithms, discusses implementations in its companion software library SLM Lab, and finishes with the practical details of getting deep RL to work. Understand each key aspect of a deep RL problem Explore policy- and value-based algorithms, including REINFORCE, SARSA, DQN, Double DQN, and Prioritized Experience Replay (PER) Delve into combined algorithms, including Actor-Critic and Proximal Policy Optimization (PPO) Understand how algorithms can be parallelized synchronously and asynchronously Run algorithms in SLM Lab and learn the practical implementation details for getting deep RL to work Explore algorithm benchmark results with tuned hyperparameters Understand how deep RL environments are designed This guide is ideal for both computer science students and software engineers who are familiar with basic machine learning concepts and have a working understanding of Python. Register your book for convenient access to downloads, updates, and/or corrections as they become available. See inside book for details | |
650 | 0 | 7 | |a Künstliche Intelligenz |0 (DE-588)4033447-8 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Python |g Programmiersprache |0 (DE-588)4434275-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Operante Konditionierung |0 (DE-588)4172613-3 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |2 gnd |9 rswk-swf |
653 | 0 | |a Reinforcement learning | |
653 | 0 | |a Python (Computer program language) | |
653 | 0 | |a Python (Computer program language) | |
689 | 0 | 0 | |a Operante Konditionierung |0 (DE-588)4172613-3 |D s |
689 | 0 | 1 | |a Künstliche Intelligenz |0 (DE-588)4033447-8 |D s |
689 | 0 | |5 DE-604 | |
689 | 1 | 0 | |a Python |g Programmiersprache |0 (DE-588)4434275-5 |D s |
689 | 1 | 1 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |D s |
689 | 1 | 2 | |a Künstliche Intelligenz |0 (DE-588)4033447-8 |D s |
689 | 1 | |5 DE-604 | |
700 | 1 | |a Keng, Wah Loon |e Verfasser |0 (DE-588)1205939903 |4 aut | |
856 | 4 | 2 | |m Digitalisierung UB Regensburg - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032267284&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
856 | 4 | 2 | |m Digitalisierung UB Regensburg - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032267284&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |3 Klappentext |
Datensatz im Suchindex
DE-BY-863_location | 1000 |
---|---|
DE-BY-FWS_call_number | 1000/ST 302 G735 |
DE-BY-FWS_katkey | 956303 |
DE-BY-FWS_media_number | 083101200107 |
_version_ | 1806528276793917440 |
adam_text |
Contents Foreword Preface xix xxi Acknowledgments xxv About the Authors xxvii 1 Introduction to Reinforcement Learning 1 1.1 Reinforcement Learning 1.2 Reinforcement Learning as MDP 1.3 Learnable Functions in Reinforcement Learning 9 Deep Reinforcement Learning Algorithms 11 1.4 1 6 1.4.1 Policy-Based Algorithms 12 1.4.2 Value-Based Algorithms 13 1.4.3 Model-Based Algorithms 13 1.4.4 Combined Methods 1.4.5 Algorithms Covered in This Book 15 1.4.6 On-Policy and Off-Policy Algorithms 16 1.4.7 Summary 15 16 1.5 Deep Learning for Reinforcement Learning 17 1.6 Reinforcement Learning and Supervised Learning 19 1.7 1.6.1 Lack of an Oracle 1.6.2 Sparsity of Feedback 1.6.3 Data Generation Summary 21 19 20 20
x Contents I Policy-Based and Value-Based Algorithms 23 2 REINFORCE 25 2.1 2.2 The Objective Function 2.3 The Policy Gradient Policy 2.3.1 26 Policy Gradient Derivation 28 2.4 Monte Carlo Sampling 30 2.5 REINFORCE Algorithm 31 2.5.1 2.6 Improving REINFORCE 32 Implementing REINFORCE 33 2.6.1 A Minimal REINFORCE Implementation 33 2.6.2 Constructing Policies with PyTorch 36 2.6.3 Sampling Actions 2.6.4 Calculating Policy Loss 39 REINFORCE Training Loop 40 2.6.5 2.6.6 3 26 27 On-Policy Replay Memory 41 2.7 Training a REINFORCE Agent 2.8 Experimental Results 44 47 2.8.1 Experiment: The Effect of Discount Factor 7 47 2.8.2 Experiment: The Effect of Baseline 49 2.9 Summary 2.10 2.11 Further Reading SARSA 38 History 51 51 51 53 3.1 The Ģ- and V-Functions 3.2 Temporal Difference Learning 3.2.1 54 56 intuition for Temporal Difference Learning 59
Contents 3.3 Action Selection in SARSA 3.3.1 3.4 3.6 3.7 Exploration and Exploitation 66 SARSA Algorithm 3.4.1 3.5 67 On-Policy Algorithms Implementing SARSA Action Function: ε-Greedy 3.5.2 Calculating the Q-Loss 3.5.3 SARSA Training Loop 3.5.4 On-Policy Batched Replay Memory 72 Training a SARSA Agent Experimental Results Summary 3.9 Further Reading 3.10 History 74 76 78 79 79 Deep Q ■Networks (DQN) 81 4.1 Learning the Q-Function in DQN 4.2 Action Selection in DQN 4.2.1 82 83 The Boltzmann Policy 4.3 Experience Replay 4.4 DQN Algorithm 4.5 Implementing DQN 86 88 89 91 4.5.1 Calculating the Q-Loss 4.5.2 DQN Training Loop 4.5.3 Replay Memory 4.6 Training a DQN Agent 96 4.7 Experimental Results 99 4.7.1 69 70 71 Experiment: The Effect of Learning Rate 77 3.8 91 92 93 Experiment: The Effect of 99 Network Architecture 4.8 Summary 101 4.9 Further Reading 102 4.10 History 102 Improving DQN 103 5.1 68 69 3.5.1 3.7.1 S 65 Target Networks 104 XI
xii Contents 5.2 Double DQN 106 5.3 Prioritized Experience Replay (PER) 109 5.3.1 5.4 Modified DQN Implementation 112 5.4.1 Network Initialization 113 5.4.2 Calculating the Q-Loss 113 5.4.3 Updating the Target Network 115 5.4.4 DQN with Target Networks 116 5.4.5 Double DQN 116 5.4.6 Prioritized Experienced Replay 117 5.5 Trainings DQN Agent to Play Atari Games 123 5.6 Experimental Results 128 5.6.1 II Importance Sampling 111 Experiment: The Effect of Double DQN and PER 128 5.7 Summary 132 5.8 Further Reading 132 Combined Methods 133 6 Advantage Actor-Critic (A2C) 135 6.1 The Actor 136 6.2 The Critic 136 6.2.1 The Advantage Function 136 6.2.2 Learning the Advantage Function 140 6.3 A2C Algorithm 141 6.4 Implementing A2C 143 6.4.1 Advantage Estimation 144 6.4.2 Calculating Value Loss and Policy Loss 147
Contents 6.4.3 6.5 Network Architecture 148 6.6 Training an A2C Agent 150 6.6.1 6.7 7 Actor-Critic Training Loop 147 A2C with n-Step Returns on Pong 150 6.6.2 A2C with GAE on Pong 153 6.6.3 A2C with n-Step Returns on BipedaiWalker 155 Experimental Results 157 6.7.1 Experiment: The Effect of n-Step Returns 158 6.7.2 Experiment: The Effect of λ of GAE 159 6.8 Summary 161 6.9 Further Reading 162 6.10 History 162 Proximal Policy Optimization (PPO) 165 7.1 Surrogate Objective 165 7.1.1 Performance Collapse 166 7.1.2 Modifying the Objective 168 7.2 7.3 Proximal Policy Optimization (PPO) 174 PPO Algorithm 177 7.4 Implementing PPO 179 7.5 7.6 7.4.1 Calculating the PPO Policy Loss 179 7.4.2 PPO Training Loop 180 Training a PPO Agent 182 7.5.1 PPO on Pong 182 7.5.2 PPO on BipedaiWalker 185 Experimental Results 188 7.6.1 Experiment: The Effect of A of GAE 188 7.6.2 Experiment: The Effect of Clipping Variable ε 190 7.7 Summary 192 7.8 Further Reading 192 xiii
xiv Contents 8 Parallelization Methods 195 8.1 Synchronous Parallelization 196 8.2 Asynchronous Parallelization 197 8.2.1 9 III Hogwild! 198 8.3 Training an A3C Agent 200 8.4 Summary 203 8.5 Further Reading 204 Algorithm Summary 205 Practical Details 207 10 Getting Deep RL to Work 209 10.1 Software Engineering Practices 209 10.1.1 10.2 Unit Tests 210 10.1.2 Code Quality 215 10.1.3 Git Workflow 216 Debugging Tips 218 10.2.1 Signs of Life 219 10.2.2 Policy Gradient Diagnoses 219 10.2.3 Data Diagnoses 220 10.2.4 Preprocessor 222 10.2.5 Memory 222 10.2.6 Algorithmic Functions 222 10.2.7 Neural Networks 222 10.2.8 Algorithm Simplification 225 10.2.9 Problem Simplification 226 10.2.10 Hyperparameters 226 10.2.11 Lab Workflow 226 10.3 Atari Tricks 228 10.4 Deep RL Almanac 231 10.4.1 Hyperparameter Tables 231
Contents 10.4.2 10.5 Algorithm Performance Comparison 234 Summary 238 11 SLM Lab 239 11.1 Algorithms Implemented in SLM Lab 239 11.2 Spec File 241 11.2.1 11.3 Running SLM Lab 246 11.3.1 11.4 SLM Lab Commands 246 Analyzing Experiment Results 247 11.4.1 11.5 Search Spec Syntax 243 Overview of the Experiment Data 247 Summary 249 12 Network Architectures 251 12.1 12.2 12.3 Types of Neural Networks 251 12.1.1 Multilayer Perceptrons (MLPs) 252 12.1.2 Convolutional Neural Networks (CNNs) 253 12.1.3 Recurrent Neural Networks (RNNs) 255 Guidelines for Choosing a Network Family 256 12.2.1 MDPsvs. POMDPs 256 12.2.2 Choosing Networks for Environments 259 The Net API 262 12.3.1 Input and Output Layer Shape Inference 264 12.3.2 Automatic Network Construction 266 12.3.3 Training Step 269 12.3.4 Exposure of Underlying Methods 270 12.4 Summary 271 12.5 Further Reading 271 xv
xvi Contents 13 Hardware 273 13.1 Computer 273 13.2 Data Types 278 13.3 Optimizing Data Types in RL 280 13.4 Choosing Hardware 285 13.5 Summary 285 Environment Design 287 14 States 289 14.1 Examples of States 289 14.2 State Completeness 296 14.3 State Complexity 297 14.4 State Information Loss 301 14.5 14.6 14.4.1 Image Grayscaling 301 14.4.2 Discretization 302 14.4.3 Hash Conflict 303 14.4.4 Metainformation Loss 303 Preprocessing 306 14.5.1 Standardization 307 14.5.2 Image Preprocessing 308 14.5.3 Temporal Preprocessing 310 Summary 313 15 Actions 315 15.1 Examples of Actions 315 15.2 Action Completeness 318 15.3 Action Complexity 319 15.4 Summary 323 15.5 Further Reading: Action Design in Everyday Things 324 16 Rewards 327 16.1 The Role of Rewards 327 16.2 Reward Design Guidelines 328 16.3 Summary 332
Contents 17 Transition Function 333 17.1 Feasibility Checks 333 17.2 Reality Check 335 17.3 Summary 337 Epilogue 338 A Deep Reinforcement Learning Timeline 343 В Example Environments 34S B.l B.2 Discrete Environments 346 B.1.1 CartPole-vO 346 B.1.2 MountainCar-vO 347 B.1.3 LunarLander-v2 347 B.1.4 PongNoFrameskip-v4 348 B.1.5 BreakoutNoFrameskip֊v4 349 Continuous Environments Pendulum-vO 350 B.2.2 BipedalWalker-v2 350 References 353 Index 363 350 B.2.1 xvii
THE CONTEMPORARY INTRODUCTION TO DEEP REINFORCEMENT LEARNING THAT COMBINES THEORY AND PRACTICE Deep reinforcement learning (deep RL) combines deep learning and reinforcement learning, in which artificial agents learn to solve sequential decision-making problems. In the past decade deep RL has achieved remarkable results on a range of problems, from single and multiplayer games—such as Go, Atari games, and Dota 2—to robotics. • Understand each key aspect of a deep RL problem • Explore policy- and value-based algorithms, including REINFORCE, SARSA, DQN, Double DQN, and Prioritized Experience Replay (PER) • Delve into combined algorithms, including Actor-Critic and Proximal Policy Optimization (PPO) This guide is ideal for both computer science students and software engineers who are familiar with basic machine learning concepts and have a working understanding of Python. • Understand how algo rithms can be parallelized synchronously and asynchronously LAURA GRAESSER is a research software engineer working in robotics at Google. She holds a master's degree in computer science from New York University, where she specialized in machine learning. WAH LOON KENG is an Al engineer at Machine Zone, where he applies deep reinforcement learning to industrial problems. He has a background in both theoretical physics and computer science. Together, they've developed two deep RL software libraries and presented many talks and tutorials on the subject. • Run algorithms in SLM Lab and learn the practical implementation details for getting deep RLto work • Explore algorithm benchmark
results with tuned hyperparameters • Understand how deep RL environments are designed Foundations of Deep Reinforcement Learning is an introduction to deep RLthat uniquely combines both theory and implementation. It starts with intuition, then carefully explains the theory of deep RL algorithms, discusses implementations in its companion software library SLM Lab, and finishes with the practical details of getting deep RL to work. |
adam_txt |
Contents Foreword Preface xix xxi Acknowledgments xxv About the Authors xxvii 1 Introduction to Reinforcement Learning 1 1.1 Reinforcement Learning 1.2 Reinforcement Learning as MDP 1.3 Learnable Functions in Reinforcement Learning 9 Deep Reinforcement Learning Algorithms 11 1.4 1 6 1.4.1 Policy-Based Algorithms 12 1.4.2 Value-Based Algorithms 13 1.4.3 Model-Based Algorithms 13 1.4.4 Combined Methods 1.4.5 Algorithms Covered in This Book 15 1.4.6 On-Policy and Off-Policy Algorithms 16 1.4.7 Summary 15 16 1.5 Deep Learning for Reinforcement Learning 17 1.6 Reinforcement Learning and Supervised Learning 19 1.7 1.6.1 Lack of an Oracle 1.6.2 Sparsity of Feedback 1.6.3 Data Generation Summary 21 19 20 20
x Contents I Policy-Based and Value-Based Algorithms 23 2 REINFORCE 25 2.1 2.2 The Objective Function 2.3 The Policy Gradient Policy 2.3.1 26 Policy Gradient Derivation 28 2.4 Monte Carlo Sampling 30 2.5 REINFORCE Algorithm 31 2.5.1 2.6 Improving REINFORCE 32 Implementing REINFORCE 33 2.6.1 A Minimal REINFORCE Implementation 33 2.6.2 Constructing Policies with PyTorch 36 2.6.3 Sampling Actions 2.6.4 Calculating Policy Loss 39 REINFORCE Training Loop 40 2.6.5 2.6.6 3 26 27 On-Policy Replay Memory 41 2.7 Training a REINFORCE Agent 2.8 Experimental Results 44 47 2.8.1 Experiment: The Effect of Discount Factor 7 47 2.8.2 Experiment: The Effect of Baseline 49 2.9 Summary 2.10 2.11 Further Reading SARSA 38 History 51 51 51 53 3.1 The Ģ- and V-Functions 3.2 Temporal Difference Learning 3.2.1 54 56 intuition for Temporal Difference Learning 59
Contents 3.3 Action Selection in SARSA 3.3.1 3.4 3.6 3.7 Exploration and Exploitation 66 SARSA Algorithm 3.4.1 3.5 67 On-Policy Algorithms Implementing SARSA Action Function: ε-Greedy 3.5.2 Calculating the Q-Loss 3.5.3 SARSA Training Loop 3.5.4 On-Policy Batched Replay Memory 72 Training a SARSA Agent Experimental Results Summary 3.9 Further Reading 3.10 History 74 76 78 79 79 Deep Q ■Networks (DQN) 81 4.1 Learning the Q-Function in DQN 4.2 Action Selection in DQN 4.2.1 82 83 The Boltzmann Policy 4.3 Experience Replay 4.4 DQN Algorithm 4.5 Implementing DQN 86 88 89 91 4.5.1 Calculating the Q-Loss 4.5.2 DQN Training Loop 4.5.3 Replay Memory 4.6 Training a DQN Agent 96 4.7 Experimental Results 99 4.7.1 69 70 71 Experiment: The Effect of Learning Rate 77 3.8 91 92 93 Experiment: The Effect of 99 Network Architecture 4.8 Summary 101 4.9 Further Reading 102 4.10 History 102 Improving DQN 103 5.1 68 69 3.5.1 3.7.1 S 65 Target Networks 104 XI
xii Contents 5.2 Double DQN 106 5.3 Prioritized Experience Replay (PER) 109 5.3.1 5.4 Modified DQN Implementation 112 5.4.1 Network Initialization 113 5.4.2 Calculating the Q-Loss 113 5.4.3 Updating the Target Network 115 5.4.4 DQN with Target Networks 116 5.4.5 Double DQN 116 5.4.6 Prioritized Experienced Replay 117 5.5 Trainings DQN Agent to Play Atari Games 123 5.6 Experimental Results 128 5.6.1 II Importance Sampling 111 Experiment: The Effect of Double DQN and PER 128 5.7 Summary 132 5.8 Further Reading 132 Combined Methods 133 6 Advantage Actor-Critic (A2C) 135 6.1 The Actor 136 6.2 The Critic 136 6.2.1 The Advantage Function 136 6.2.2 Learning the Advantage Function 140 6.3 A2C Algorithm 141 6.4 Implementing A2C 143 6.4.1 Advantage Estimation 144 6.4.2 Calculating Value Loss and Policy Loss 147
Contents 6.4.3 6.5 Network Architecture 148 6.6 Training an A2C Agent 150 6.6.1 6.7 7 Actor-Critic Training Loop 147 A2C with n-Step Returns on Pong 150 6.6.2 A2C with GAE on Pong 153 6.6.3 A2C with n-Step Returns on BipedaiWalker 155 Experimental Results 157 6.7.1 Experiment: The Effect of n-Step Returns 158 6.7.2 Experiment: The Effect of λ of GAE 159 6.8 Summary 161 6.9 Further Reading 162 6.10 History 162 Proximal Policy Optimization (PPO) 165 7.1 Surrogate Objective 165 7.1.1 Performance Collapse 166 7.1.2 Modifying the Objective 168 7.2 7.3 Proximal Policy Optimization (PPO) 174 PPO Algorithm 177 7.4 Implementing PPO 179 7.5 7.6 7.4.1 Calculating the PPO Policy Loss 179 7.4.2 PPO Training Loop 180 Training a PPO Agent 182 7.5.1 PPO on Pong 182 7.5.2 PPO on BipedaiWalker 185 Experimental Results 188 7.6.1 Experiment: The Effect of A of GAE 188 7.6.2 Experiment: The Effect of Clipping Variable ε 190 7.7 Summary 192 7.8 Further Reading 192 xiii
xiv Contents 8 Parallelization Methods 195 8.1 Synchronous Parallelization 196 8.2 Asynchronous Parallelization 197 8.2.1 9 III Hogwild! 198 8.3 Training an A3C Agent 200 8.4 Summary 203 8.5 Further Reading 204 Algorithm Summary 205 Practical Details 207 10 Getting Deep RL to Work 209 10.1 Software Engineering Practices 209 10.1.1 10.2 Unit Tests 210 10.1.2 Code Quality 215 10.1.3 Git Workflow 216 Debugging Tips 218 10.2.1 Signs of Life 219 10.2.2 Policy Gradient Diagnoses 219 10.2.3 Data Diagnoses 220 10.2.4 Preprocessor 222 10.2.5 Memory 222 10.2.6 Algorithmic Functions 222 10.2.7 Neural Networks 222 10.2.8 Algorithm Simplification 225 10.2.9 Problem Simplification 226 10.2.10 Hyperparameters 226 10.2.11 Lab Workflow 226 10.3 Atari Tricks 228 10.4 Deep RL Almanac 231 10.4.1 Hyperparameter Tables 231
Contents 10.4.2 10.5 Algorithm Performance Comparison 234 Summary 238 11 SLM Lab 239 11.1 Algorithms Implemented in SLM Lab 239 11.2 Spec File 241 11.2.1 11.3 Running SLM Lab 246 11.3.1 11.4 SLM Lab Commands 246 Analyzing Experiment Results 247 11.4.1 11.5 Search Spec Syntax 243 Overview of the Experiment Data 247 Summary 249 12 Network Architectures 251 12.1 12.2 12.3 Types of Neural Networks 251 12.1.1 Multilayer Perceptrons (MLPs) 252 12.1.2 Convolutional Neural Networks (CNNs) 253 12.1.3 Recurrent Neural Networks (RNNs) 255 Guidelines for Choosing a Network Family 256 12.2.1 MDPsvs. POMDPs 256 12.2.2 Choosing Networks for Environments 259 The Net API 262 12.3.1 Input and Output Layer Shape Inference 264 12.3.2 Automatic Network Construction 266 12.3.3 Training Step 269 12.3.4 Exposure of Underlying Methods 270 12.4 Summary 271 12.5 Further Reading 271 xv
xvi Contents 13 Hardware 273 13.1 Computer 273 13.2 Data Types 278 13.3 Optimizing Data Types in RL 280 13.4 Choosing Hardware 285 13.5 Summary 285 Environment Design 287 14 States 289 14.1 Examples of States 289 14.2 State Completeness 296 14.3 State Complexity 297 14.4 State Information Loss 301 14.5 14.6 14.4.1 Image Grayscaling 301 14.4.2 Discretization 302 14.4.3 Hash Conflict 303 14.4.4 Metainformation Loss 303 Preprocessing 306 14.5.1 Standardization 307 14.5.2 Image Preprocessing 308 14.5.3 Temporal Preprocessing 310 Summary 313 15 Actions 315 15.1 Examples of Actions 315 15.2 Action Completeness 318 15.3 Action Complexity 319 15.4 Summary 323 15.5 Further Reading: Action Design in Everyday Things 324 16 Rewards 327 16.1 The Role of Rewards 327 16.2 Reward Design Guidelines 328 16.3 Summary 332
Contents 17 Transition Function 333 17.1 Feasibility Checks 333 17.2 Reality Check 335 17.3 Summary 337 Epilogue 338 A Deep Reinforcement Learning Timeline 343 В Example Environments 34S B.l B.2 Discrete Environments 346 B.1.1 CartPole-vO 346 B.1.2 MountainCar-vO 347 B.1.3 LunarLander-v2 347 B.1.4 PongNoFrameskip-v4 348 B.1.5 BreakoutNoFrameskip֊v4 349 Continuous Environments Pendulum-vO 350 B.2.2 BipedalWalker-v2 350 References 353 Index 363 350 B.2.1 xvii
THE CONTEMPORARY INTRODUCTION TO DEEP REINFORCEMENT LEARNING THAT COMBINES THEORY AND PRACTICE Deep reinforcement learning (deep RL) combines deep learning and reinforcement learning, in which artificial agents learn to solve sequential decision-making problems. In the past decade deep RL has achieved remarkable results on a range of problems, from single and multiplayer games—such as Go, Atari games, and Dota 2—to robotics. • Understand each key aspect of a deep RL problem • Explore policy- and value-based algorithms, including REINFORCE, SARSA, DQN, Double DQN, and Prioritized Experience Replay (PER) • Delve into combined algorithms, including Actor-Critic and Proximal Policy Optimization (PPO) This guide is ideal for both computer science students and software engineers who are familiar with basic machine learning concepts and have a working understanding of Python. • Understand how algo rithms can be parallelized synchronously and asynchronously LAURA GRAESSER is a research software engineer working in robotics at Google. She holds a master's degree in computer science from New York University, where she specialized in machine learning. WAH LOON KENG is an Al engineer at Machine Zone, where he applies deep reinforcement learning to industrial problems. He has a background in both theoretical physics and computer science. Together, they've developed two deep RL software libraries and presented many talks and tutorials on the subject. • Run algorithms in SLM Lab and learn the practical implementation details for getting deep RLto work • Explore algorithm benchmark
results with tuned hyperparameters • Understand how deep RL environments are designed Foundations of Deep Reinforcement Learning is an introduction to deep RLthat uniquely combines both theory and implementation. It starts with intuition, then carefully explains the theory of deep RL algorithms, discusses implementations in its companion software library SLM Lab, and finishes with the practical details of getting deep RL to work. |
any_adam_object | 1 |
any_adam_object_boolean | 1 |
author | Graesser, Laura Keng, Wah Loon |
author_GND | (DE-588)1216177937 (DE-588)1205939903 |
author_facet | Graesser, Laura Keng, Wah Loon |
author_role | aut aut |
author_sort | Graesser, Laura |
author_variant | l g lg w l k wl wlk |
building | Verbundindex |
bvnumber | BV046858587 |
callnumber-first | Q - Science |
callnumber-label | Q325 |
callnumber-raw | Q325.6 |
callnumber-search | Q325.6 |
callnumber-sort | Q 3325.6 |
callnumber-subject | Q - General Science |
classification_rvk | ST 300 ST 302 ST 250 |
ctrlnum | (OCoLC)1191892931 (DE-599)KXP1700578286 |
dewey-full | 006.31 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 006 - Special computer methods |
dewey-raw | 006.31 |
dewey-search | 006.31 |
dewey-sort | 16.31 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik |
discipline_str_mv | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 c 4500</leader><controlfield tag="001">BV046858587</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20220330</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">200819s2021 xxua||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780135172384</subfield><subfield code="c">paperback</subfield><subfield code="9">978-0-13-517238-4</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">0135172381</subfield><subfield code="c">paperback</subfield><subfield code="9">0-13-517238-1</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1191892931</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)KXP1700578286</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxu</subfield><subfield code="c">XD-US</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-573</subfield><subfield code="a">DE-83</subfield><subfield code="a">DE-29T</subfield><subfield code="a">DE-384</subfield><subfield code="a">DE-355</subfield><subfield code="a">DE-858</subfield><subfield code="a">DE-863</subfield><subfield code="a">DE-898</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">Q325.6</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.31</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 300</subfield><subfield code="0">(DE-625)143650:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 302</subfield><subfield code="0">(DE-625)143652:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 250</subfield><subfield code="0">(DE-625)143626:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Graesser, Laura</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1216177937</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Foundations of deep reinforcement learning</subfield><subfield code="b">theory and practice in Python</subfield><subfield code="c">Laura Graesser, Wah Loon Keng</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Boston</subfield><subfield code="b">Addison-Wesley</subfield><subfield code="c">2021</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">© 2020</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xxvii, 379 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield><subfield code="c">23 cm</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Addison Wesley data & analytics series</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Literaturverzeichnis Seite 353-362</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">The Contemporary Introduction to Deep Reinforcement Learning that Combines Theory and Practice Deep reinforcement learning (deep RL) combines deep learning and reinforcement learning, in which artificial agents learn to solve sequential decision-making problems. In the past decade deep RL has achieved remarkable results on a range of problems, from single and multiplayer games--such as Go, Atari games, and DotA 2--to robotics. Foundations of Deep Reinforcement Learning is an introduction to deep RL that uniquely combines both theory and implementation. It starts with intuition, then carefully explains the theory of deep RL algorithms, discusses implementations in its companion software library SLM Lab, and finishes with the practical details of getting deep RL to work. Understand each key aspect of a deep RL problem Explore policy- and value-based algorithms, including REINFORCE, SARSA, DQN, Double DQN, and Prioritized Experience Replay (PER) Delve into combined algorithms, including Actor-Critic and Proximal Policy Optimization (PPO) Understand how algorithms can be parallelized synchronously and asynchronously Run algorithms in SLM Lab and learn the practical implementation details for getting deep RL to work Explore algorithm benchmark results with tuned hyperparameters Understand how deep RL environments are designed This guide is ideal for both computer science students and software engineers who are familiar with basic machine learning concepts and have a working understanding of Python. Register your book for convenient access to downloads, updates, and/or corrections as they become available. See inside book for details</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4033447-8</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Python</subfield><subfield code="g">Programmiersprache</subfield><subfield code="0">(DE-588)4434275-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Operante Konditionierung</subfield><subfield code="0">(DE-588)4172613-3</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Reinforcement learning</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Python (Computer program language)</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Python (Computer program language)</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Operante Konditionierung</subfield><subfield code="0">(DE-588)4172613-3</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4033447-8</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="689" ind1="1" ind2="0"><subfield code="a">Python</subfield><subfield code="g">Programmiersprache</subfield><subfield code="0">(DE-588)4434275-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2="1"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2="2"><subfield code="a">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4033447-8</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Keng, Wah Loon</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1205939903</subfield><subfield code="4">aut</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Regensburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032267284&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Regensburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032267284&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Klappentext</subfield></datafield></record></collection> |
id | DE-604.BV046858587 |
illustrated | Illustrated |
index_date | 2024-07-03T15:12:09Z |
indexdate | 2024-08-05T08:33:39Z |
institution | BVB |
isbn | 9780135172384 0135172381 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-032267284 |
oclc_num | 1191892931 |
open_access_boolean | |
owner | DE-573 DE-83 DE-29T DE-384 DE-355 DE-BY-UBR DE-858 DE-863 DE-BY-FWS DE-898 DE-BY-UBR |
owner_facet | DE-573 DE-83 DE-29T DE-384 DE-355 DE-BY-UBR DE-858 DE-863 DE-BY-FWS DE-898 DE-BY-UBR |
physical | xxvii, 379 Seiten Illustrationen, Diagramme 23 cm |
publishDate | 2021 |
publishDateSearch | 2021 |
publishDateSort | 2021 |
publisher | Addison-Wesley |
record_format | marc |
series2 | Addison Wesley data & analytics series |
spellingShingle | Graesser, Laura Keng, Wah Loon Foundations of deep reinforcement learning theory and practice in Python Künstliche Intelligenz (DE-588)4033447-8 gnd Python Programmiersprache (DE-588)4434275-5 gnd Operante Konditionierung (DE-588)4172613-3 gnd Maschinelles Lernen (DE-588)4193754-5 gnd |
subject_GND | (DE-588)4033447-8 (DE-588)4434275-5 (DE-588)4172613-3 (DE-588)4193754-5 |
title | Foundations of deep reinforcement learning theory and practice in Python |
title_auth | Foundations of deep reinforcement learning theory and practice in Python |
title_exact_search | Foundations of deep reinforcement learning theory and practice in Python |
title_exact_search_txtP | Foundations of deep reinforcement learning theory and practice in Python |
title_full | Foundations of deep reinforcement learning theory and practice in Python Laura Graesser, Wah Loon Keng |
title_fullStr | Foundations of deep reinforcement learning theory and practice in Python Laura Graesser, Wah Loon Keng |
title_full_unstemmed | Foundations of deep reinforcement learning theory and practice in Python Laura Graesser, Wah Loon Keng |
title_short | Foundations of deep reinforcement learning |
title_sort | foundations of deep reinforcement learning theory and practice in python |
title_sub | theory and practice in Python |
topic | Künstliche Intelligenz (DE-588)4033447-8 gnd Python Programmiersprache (DE-588)4434275-5 gnd Operante Konditionierung (DE-588)4172613-3 gnd Maschinelles Lernen (DE-588)4193754-5 gnd |
topic_facet | Künstliche Intelligenz Python Programmiersprache Operante Konditionierung Maschinelles Lernen |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032267284&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032267284&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT graesserlaura foundationsofdeepreinforcementlearningtheoryandpracticeinpython AT kengwahloon foundationsofdeepreinforcementlearningtheoryandpracticeinpython |
Inhaltsverzeichnis
THWS Würzburg Zentralbibliothek Lesesaal
Signatur: |
1000 ST 302 G735 |
---|---|
Exemplar 1 | ausleihbar Checked out – Rückgabe bis: 30.04.2025 Vormerken |