Verfügbarkeit: Foundations of deep reinforcement learning

Foundations of deep reinforcement learning: theory and practice in Python

The Contemporary Introduction to Deep Reinforcement Learning that Combines Theory and Practice Deep reinforcement learning (deep RL) combines deep learning and reinforcement learning, in which artificial agents learn to solve sequential decision-making problems. In the past decade deep RL has achiev...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Graesser, Laura (VerfasserIn), Keng, Wah Loon (VerfasserIn)
Format:	Buch
Sprache:	English
Veröffentlicht:	Boston Addison-Wesley 2021
Schriftenreihe:	Addison Wesley data & analytics series
Schlagworte:	Künstliche Intelligenz Python > Programmiersprache Operante Konditionierung Maschinelles Lernen Reinforcement learning Python (Computer program language)
Online-Zugang:	Inhaltsverzeichnis Klappentext
Zusammenfassung:	The Contemporary Introduction to Deep Reinforcement Learning that Combines Theory and Practice Deep reinforcement learning (deep RL) combines deep learning and reinforcement learning, in which artificial agents learn to solve sequential decision-making problems. In the past decade deep RL has achieved remarkable results on a range of problems, from single and multiplayer games--such as Go, Atari games, and DotA 2--to robotics. Foundations of Deep Reinforcement Learning is an introduction to deep RL that uniquely combines both theory and implementation. It starts with intuition, then carefully explains the theory of deep RL algorithms, discusses implementations in its companion software library SLM Lab, and finishes with the practical details of getting deep RL to work. Understand each key aspect of a deep RL problem Explore policy- and value-based algorithms, including REINFORCE, SARSA, DQN, Double DQN, and Prioritized Experience Replay (PER) Delve into combined algorithms, including Actor-Critic and Proximal Policy Optimization (PPO) Understand how algorithms can be parallelized synchronously and asynchronously Run algorithms in SLM Lab and learn the practical implementation details for getting deep RL to work Explore algorithm benchmark results with tuned hyperparameters Understand how deep RL environments are designed This guide is ideal for both computer science students and software engineers who are familiar with basic machine learning concepts and have a working understanding of Python. Register your book for convenient access to downloads, updates, and/or corrections as they become available. See inside book for details
Beschreibung:	Literaturverzeichnis Seite 353-362
Beschreibung:	xxvii, 379 Seiten Illustrationen, Diagramme 23 cm
ISBN:	9780135172384 0135172381

Internformat

MARC


LEADER	00000nam a2200000 c 4500
001	BV046858587
003	DE-604
005	20220330
007	t
008	200819s2021 xxua\|\|\| \|\|\|\| 00\|\|\| eng d
020			\|a 9780135172384 \|c paperback \|9 978-0-13-517238-4
020			\|a 0135172381 \|c paperback \|9 0-13-517238-1
035			\|a (OCoLC)1191892931
035			\|a (DE-599)KXP1700578286
040			\|a DE-604 \|b ger \|e rda
041	0		\|a eng
044			\|a xxu \|c XD-US
049			\|a DE-573 \|a DE-83 \|a DE-29T \|a DE-384 \|a DE-355 \|a DE-858 \|a DE-863 \|a DE-898
050		0	\|a Q325.6
082	0		\|a 006.31
084			\|a ST 300 \|0 (DE-625)143650: \|2 rvk
084			\|a ST 302 \|0 (DE-625)143652: \|2 rvk
084			\|a ST 250 \|0 (DE-625)143626: \|2 rvk
100	1		\|a Graesser, Laura \|e Verfasser \|0 (DE-588)1216177937 \|4 aut
245	1	0	\|a Foundations of deep reinforcement learning \|b theory and practice in Python \|c Laura Graesser, Wah Loon Keng
264		1	\|a Boston \|b Addison-Wesley \|c 2021
264		4	\|c © 2020
300			\|a xxvii, 379 Seiten \|b Illustrationen, Diagramme \|c 23 cm
336			\|b txt \|2 rdacontent
337			\|b n \|2 rdamedia
338			\|b nc \|2 rdacarrier
490	0		\|a Addison Wesley data & analytics series
500			\|a Literaturverzeichnis Seite 353-362
520	3		\|a The Contemporary Introduction to Deep Reinforcement Learning that Combines Theory and Practice Deep reinforcement learning (deep RL) combines deep learning and reinforcement learning, in which artificial agents learn to solve sequential decision-making problems. In the past decade deep RL has achieved remarkable results on a range of problems, from single and multiplayer games--such as Go, Atari games, and DotA 2--to robotics. Foundations of Deep Reinforcement Learning is an introduction to deep RL that uniquely combines both theory and implementation. It starts with intuition, then carefully explains the theory of deep RL algorithms, discusses implementations in its companion software library SLM Lab, and finishes with the practical details of getting deep RL to work. Understand each key aspect of a deep RL problem Explore policy- and value-based algorithms, including REINFORCE, SARSA, DQN, Double DQN, and Prioritized Experience Replay (PER) Delve into combined algorithms, including Actor-Critic and Proximal Policy Optimization (PPO) Understand how algorithms can be parallelized synchronously and asynchronously Run algorithms in SLM Lab and learn the practical implementation details for getting deep RL to work Explore algorithm benchmark results with tuned hyperparameters Understand how deep RL environments are designed This guide is ideal for both computer science students and software engineers who are familiar with basic machine learning concepts and have a working understanding of Python. Register your book for convenient access to downloads, updates, and/or corrections as they become available. See inside book for details
650	0	7	\|a Künstliche Intelligenz \|0 (DE-588)4033447-8 \|2 gnd \|9 rswk-swf
650	0	7	\|a Python \|g Programmiersprache \|0 (DE-588)4434275-5 \|2 gnd \|9 rswk-swf
650	0	7	\|a Operante Konditionierung \|0 (DE-588)4172613-3 \|2 gnd \|9 rswk-swf
650	0	7	\|a Maschinelles Lernen \|0 (DE-588)4193754-5 \|2 gnd \|9 rswk-swf
653		0	\|a Reinforcement learning
653		0	\|a Python (Computer program language)
653		0	\|a Python (Computer program language)
689	0	0	\|a Operante Konditionierung \|0 (DE-588)4172613-3 \|D s
689	0	1	\|a Künstliche Intelligenz \|0 (DE-588)4033447-8 \|D s
689	0		\|5 DE-604
689	1	0	\|a Python \|g Programmiersprache \|0 (DE-588)4434275-5 \|D s
689	1	1	\|a Maschinelles Lernen \|0 (DE-588)4193754-5 \|D s
689	1	2	\|a Künstliche Intelligenz \|0 (DE-588)4033447-8 \|D s
689	1		\|5 DE-604
700	1		\|a Keng, Wah Loon \|e Verfasser \|0 (DE-588)1205939903 \|4 aut
856	4	2	\|m Digitalisierung UB Regensburg - ADAM Catalogue Enrichment \|q application/pdf \|u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032267284&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA \|3 Inhaltsverzeichnis
856	4	2	\|m Digitalisierung UB Regensburg - ADAM Catalogue Enrichment \|q application/pdf \|u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032267284&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA \|3 Klappentext

Datensatz im Suchindex

DE-BY-863_location	1000
DE-BY-FWS_call_number	1000/ST 302 G735
DE-BY-FWS_katkey	956303
DE-BY-FWS_media_number	083101200107
_version_	1824555380094533633
adam_text	Contents Foreword Preface xix xxi Acknowledgments xxv About the Authors xxvii 1 Introduction to Reinforcement Learning 1 1.1 Reinforcement Learning 1.2 Reinforcement Learning as MDP 1.3 Learnable Functions in Reinforcement Learning 9 Deep Reinforcement Learning Algorithms 11 1.4 1 6 1.4.1 Policy-Based Algorithms 12 1.4.2 Value-Based Algorithms 13 1.4.3 Model-Based Algorithms 13 1.4.4 Combined Methods 1.4.5 Algorithms Covered in This Book 15 1.4.6 On-Policy and Off-Policy Algorithms 16 1.4.7 Summary 15 16 1.5 Deep Learning for Reinforcement Learning 17 1.6 Reinforcement Learning and Supervised Learning 19 1.7 1.6.1 Lack of an Oracle 1.6.2 Sparsity of Feedback 1.6.3 Data Generation Summary 21 19 20 20 x Contents I Policy-Based and Value-Based Algorithms 23 2 REINFORCE 25 2.1 2.2 The Objective Function 2.3 The Policy Gradient Policy 2.3.1 26 Policy Gradient Derivation 28 2.4 Monte Carlo Sampling 30 2.5 REINFORCE Algorithm 31 2.5.1 2.6 Improving REINFORCE 32 Implementing REINFORCE 33 2.6.1 A Minimal REINFORCE Implementation 33 2.6.2 Constructing Policies with PyTorch 36 2.6.3 Sampling Actions 2.6.4 Calculating Policy Loss 39 REINFORCE Training Loop 40 2.6.5 2.6.6 3 26 27 On-Policy Replay Memory 41 2.7 Training a REINFORCE Agent 2.8 Experimental Results 44 47 2.8.1 Experiment: The Effect of Discount Factor 7 47 2.8.2 Experiment: The Effect of Baseline 49 2.9 Summary 2.10 2.11 Further Reading SARSA 38 History 51 51 51 53 3.1 The Ģ- and V-Functions 3.2 Temporal Difference Learning 3.2.1 54 56 intuition for Temporal Difference Learning 59 Contents 3.3 Action Selection in SARSA 3.3.1 3.4 3.6 3.7 Exploration and Exploitation 66 SARSA Algorithm 3.4.1 3.5 67 On-Policy Algorithms Implementing SARSA Action Function: ε-Greedy 3.5.2 Calculating the Q-Loss 3.5.3 SARSA Training Loop 3.5.4 On-Policy Batched Replay Memory 72 Training a SARSA Agent Experimental Results Summary 3.9 Further Reading 3.10 History 74 76 78 79 79 Deep Q ■Networks (DQN) 81 4.1 Learning the Q-Function in DQN 4.2 Action Selection in DQN 4.2.1 82 83 The Boltzmann Policy 4.3 Experience Replay 4.4 DQN Algorithm 4.5 Implementing DQN 86 88 89 91 4.5.1 Calculating the Q-Loss 4.5.2 DQN Training Loop 4.5.3 Replay Memory 4.6 Training a DQN Agent 96 4.7 Experimental Results 99 4.7.1 69 70 71 Experiment: The Effect of Learning Rate 77 3.8 91 92 93 Experiment: The Effect of 99 Network Architecture 4.8 Summary 101 4.9 Further Reading 102 4.10 History 102 Improving DQN 103 5.1 68 69 3.5.1 3.7.1 S 65 Target Networks 104 XI xii Contents 5.2 Double DQN 106 5.3 Prioritized Experience Replay (PER) 109 5.3.1 5.4 Modified DQN Implementation 112 5.4.1 Network Initialization 113 5.4.2 Calculating the Q-Loss 113 5.4.3 Updating the Target Network 115 5.4.4 DQN with Target Networks 116 5.4.5 Double DQN 116 5.4.6 Prioritized Experienced Replay 117 5.5 Trainings DQN Agent to Play Atari Games 123 5.6 Experimental Results 128 5.6.1 II Importance Sampling 111 Experiment: The Effect of Double DQN and PER 128 5.7 Summary 132 5.8 Further Reading 132 Combined Methods 133 6 Advantage Actor-Critic (A2C) 135 6.1 The Actor 136 6.2 The Critic 136 6.2.1 The Advantage Function 136 6.2.2 Learning the Advantage Function 140 6.3 A2C Algorithm 141 6.4 Implementing A2C 143 6.4.1 Advantage Estimation 144 6.4.2 Calculating Value Loss and Policy Loss 147 Contents 6.4.3 6.5 Network Architecture 148 6.6 Training an A2C Agent 150 6.6.1 6.7 7 Actor-Critic Training Loop 147 A2C with n-Step Returns on Pong 150 6.6.2 A2C with GAE on Pong 153 6.6.3 A2C with n-Step Returns on BipedaiWalker 155 Experimental Results 157 6.7.1 Experiment: The Effect of n-Step Returns 158 6.7.2 Experiment: The Effect of λ of GAE 159 6.8 Summary 161 6.9 Further Reading 162 6.10 History 162 Proximal Policy Optimization (PPO) 165 7.1 Surrogate Objective 165 7.1.1 Performance Collapse 166 7.1.2 Modifying the Objective 168 7.2 7.3 Proximal Policy Optimization (PPO) 174 PPO Algorithm 177 7.4 Implementing PPO 179 7.5 7.6 7.4.1 Calculating the PPO Policy Loss 179 7.4.2 PPO Training Loop 180 Training a PPO Agent 182 7.5.1 PPO on Pong 182 7.5.2 PPO on BipedaiWalker 185 Experimental Results 188 7.6.1 Experiment: The Effect of A of GAE 188 7.6.2 Experiment: The Effect of Clipping Variable ε 190 7.7 Summary 192 7.8 Further Reading 192 xiii xiv Contents 8 Parallelization Methods 195 8.1 Synchronous Parallelization 196 8.2 Asynchronous Parallelization 197 8.2.1 9 III Hogwild! 198 8.3 Training an A3C Agent 200 8.4 Summary 203 8.5 Further Reading 204 Algorithm Summary 205 Practical Details 207 10 Getting Deep RL to Work 209 10.1 Software Engineering Practices 209 10.1.1 10.2 Unit Tests 210 10.1.2 Code Quality 215 10.1.3 Git Workflow 216 Debugging Tips 218 10.2.1 Signs of Life 219 10.2.2 Policy Gradient Diagnoses 219 10.2.3 Data Diagnoses 220 10.2.4 Preprocessor 222 10.2.5 Memory 222 10.2.6 Algorithmic Functions 222 10.2.7 Neural Networks 222 10.2.8 Algorithm Simplification 225 10.2.9 Problem Simplification 226 10.2.10 Hyperparameters 226 10.2.11 Lab Workflow 226 10.3 Atari Tricks 228 10.4 Deep RL Almanac 231 10.4.1 Hyperparameter Tables 231 Contents 10.4.2 10.5 Algorithm Performance Comparison 234 Summary 238 11 SLM Lab 239 11.1 Algorithms Implemented in SLM Lab 239 11.2 Spec File 241 11.2.1 11.3 Running SLM Lab 246 11.3.1 11.4 SLM Lab Commands 246 Analyzing Experiment Results 247 11.4.1 11.5 Search Spec Syntax 243 Overview of the Experiment Data 247 Summary 249 12 Network Architectures 251 12.1 12.2 12.3 Types of Neural Networks 251 12.1.1 Multilayer Perceptrons (MLPs) 252 12.1.2 Convolutional Neural Networks (CNNs) 253 12.1.3 Recurrent Neural Networks (RNNs) 255 Guidelines for Choosing a Network Family 256 12.2.1 MDPsvs. POMDPs 256 12.2.2 Choosing Networks for Environments 259 The Net API 262 12.3.1 Input and Output Layer Shape Inference 264 12.3.2 Automatic Network Construction 266 12.3.3 Training Step 269 12.3.4 Exposure of Underlying Methods 270 12.4 Summary 271 12.5 Further Reading 271 xv xvi Contents 13 Hardware 273 13.1 Computer 273 13.2 Data Types 278 13.3 Optimizing Data Types in RL 280 13.4 Choosing Hardware 285 13.5 Summary 285 Environment Design 287 14 States 289 14.1 Examples of States 289 14.2 State Completeness 296 14.3 State Complexity 297 14.4 State Information Loss 301 14.5 14.6 14.4.1 Image Grayscaling 301 14.4.2 Discretization 302 14.4.3 Hash Conflict 303 14.4.4 Metainformation Loss 303 Preprocessing 306 14.5.1 Standardization 307 14.5.2 Image Preprocessing 308 14.5.3 Temporal Preprocessing 310 Summary 313 15 Actions 315 15.1 Examples of Actions 315 15.2 Action Completeness 318 15.3 Action Complexity 319 15.4 Summary 323 15.5 Further Reading: Action Design in Everyday Things 324 16 Rewards 327 16.1 The Role of Rewards 327 16.2 Reward Design Guidelines 328 16.3 Summary 332 Contents 17 Transition Function 333 17.1 Feasibility Checks 333 17.2 Reality Check 335 17.3 Summary 337 Epilogue 338 A Deep Reinforcement Learning Timeline 343 В Example Environments 34S B.l B.2 Discrete Environments 346 B.1.1 CartPole-vO 346 B.1.2 MountainCar-vO 347 B.1.3 LunarLander-v2 347 B.1.4 PongNoFrameskip-v4 348 B.1.5 BreakoutNoFrameskip֊v4 349 Continuous Environments Pendulum-vO 350 B.2.2 BipedalWalker-v2 350 References 353 Index 363 350 B.2.1 xvii THE CONTEMPORARY INTRODUCTION TO DEEP REINFORCEMENT LEARNING THAT COMBINES THEORY AND PRACTICE Deep reinforcement learning (deep RL) combines deep learning and reinforcement learning, in which artificial agents learn to solve sequential decision-making problems. In the past decade deep RL has achieved remarkable results on a range of problems, from single and multiplayer games—such as Go, Atari games, and Dota 2—to robotics. • Understand each key aspect of a deep RL problem • Explore policy- and value-based algorithms, including REINFORCE, SARSA, DQN, Double DQN, and Prioritized Experience Replay (PER) • Delve into combined algorithms, including Actor-Critic and Proximal Policy Optimization (PPO) This guide is ideal for both computer science students and software engineers who are familiar with basic machine learning concepts and have a working understanding of Python. • Understand how algo rithms can be parallelized synchronously and asynchronously LAURA GRAESSER is a research software engineer working in robotics at Google. She holds a master's degree in computer science from New York University, where she specialized in machine learning. WAH LOON KENG is an Al engineer at Machine Zone, where he applies deep reinforcement learning to industrial problems. He has a background in both theoretical physics and computer science. Together, they've developed two deep RL software libraries and presented many talks and tutorials on the subject. • Run algorithms in SLM Lab and learn the practical implementation details for getting deep RLto work • Explore algorithm benchmark results with tuned hyperparameters • Understand how deep RL environments are designed Foundations of Deep Reinforcement Learning is an introduction to deep RLthat uniquely combines both theory and implementation. It starts with intuition, then carefully explains the theory of deep RL algorithms, discusses implementations in its companion software library SLM Lab, and finishes with the practical details of getting deep RL to work.
adam_txt	Contents Foreword Preface xix xxi Acknowledgments xxv About the Authors xxvii 1 Introduction to Reinforcement Learning 1 1.1 Reinforcement Learning 1.2 Reinforcement Learning as MDP 1.3 Learnable Functions in Reinforcement Learning 9 Deep Reinforcement Learning Algorithms 11 1.4 1 6 1.4.1 Policy-Based Algorithms 12 1.4.2 Value-Based Algorithms 13 1.4.3 Model-Based Algorithms 13 1.4.4 Combined Methods 1.4.5 Algorithms Covered in This Book 15 1.4.6 On-Policy and Off-Policy Algorithms 16 1.4.7 Summary 15 16 1.5 Deep Learning for Reinforcement Learning 17 1.6 Reinforcement Learning and Supervised Learning 19 1.7 1.6.1 Lack of an Oracle 1.6.2 Sparsity of Feedback 1.6.3 Data Generation Summary 21 19 20 20 x Contents I Policy-Based and Value-Based Algorithms 23 2 REINFORCE 25 2.1 2.2 The Objective Function 2.3 The Policy Gradient Policy 2.3.1 26 Policy Gradient Derivation 28 2.4 Monte Carlo Sampling 30 2.5 REINFORCE Algorithm 31 2.5.1 2.6 Improving REINFORCE 32 Implementing REINFORCE 33 2.6.1 A Minimal REINFORCE Implementation 33 2.6.2 Constructing Policies with PyTorch 36 2.6.3 Sampling Actions 2.6.4 Calculating Policy Loss 39 REINFORCE Training Loop 40 2.6.5 2.6.6 3 26 27 On-Policy Replay Memory 41 2.7 Training a REINFORCE Agent 2.8 Experimental Results 44 47 2.8.1 Experiment: The Effect of Discount Factor 7 47 2.8.2 Experiment: The Effect of Baseline 49 2.9 Summary 2.10 2.11 Further Reading SARSA 38 History 51 51 51 53 3.1 The Ģ- and V-Functions 3.2 Temporal Difference Learning 3.2.1 54 56 intuition for Temporal Difference Learning 59 Contents 3.3 Action Selection in SARSA 3.3.1 3.4 3.6 3.7 Exploration and Exploitation 66 SARSA Algorithm 3.4.1 3.5 67 On-Policy Algorithms Implementing SARSA Action Function: ε-Greedy 3.5.2 Calculating the Q-Loss 3.5.3 SARSA Training Loop 3.5.4 On-Policy Batched Replay Memory 72 Training a SARSA Agent Experimental Results Summary 3.9 Further Reading 3.10 History 74 76 78 79 79 Deep Q ■Networks (DQN) 81 4.1 Learning the Q-Function in DQN 4.2 Action Selection in DQN 4.2.1 82 83 The Boltzmann Policy 4.3 Experience Replay 4.4 DQN Algorithm 4.5 Implementing DQN 86 88 89 91 4.5.1 Calculating the Q-Loss 4.5.2 DQN Training Loop 4.5.3 Replay Memory 4.6 Training a DQN Agent 96 4.7 Experimental Results 99 4.7.1 69 70 71 Experiment: The Effect of Learning Rate 77 3.8 91 92 93 Experiment: The Effect of 99 Network Architecture 4.8 Summary 101 4.9 Further Reading 102 4.10 History 102 Improving DQN 103 5.1 68 69 3.5.1 3.7.1 S 65 Target Networks 104 XI xii Contents 5.2 Double DQN 106 5.3 Prioritized Experience Replay (PER) 109 5.3.1 5.4 Modified DQN Implementation 112 5.4.1 Network Initialization 113 5.4.2 Calculating the Q-Loss 113 5.4.3 Updating the Target Network 115 5.4.4 DQN with Target Networks 116 5.4.5 Double DQN 116 5.4.6 Prioritized Experienced Replay 117 5.5 Trainings DQN Agent to Play Atari Games 123 5.6 Experimental Results 128 5.6.1 II Importance Sampling 111 Experiment: The Effect of Double DQN and PER 128 5.7 Summary 132 5.8 Further Reading 132 Combined Methods 133 6 Advantage Actor-Critic (A2C) 135 6.1 The Actor 136 6.2 The Critic 136 6.2.1 The Advantage Function 136 6.2.2 Learning the Advantage Function 140 6.3 A2C Algorithm 141 6.4 Implementing A2C 143 6.4.1 Advantage Estimation 144 6.4.2 Calculating Value Loss and Policy Loss 147 Contents 6.4.3 6.5 Network Architecture 148 6.6 Training an A2C Agent 150 6.6.1 6.7 7 Actor-Critic Training Loop 147 A2C with n-Step Returns on Pong 150 6.6.2 A2C with GAE on Pong 153 6.6.3 A2C with n-Step Returns on BipedaiWalker 155 Experimental Results 157 6.7.1 Experiment: The Effect of n-Step Returns 158 6.7.2 Experiment: The Effect of λ of GAE 159 6.8 Summary 161 6.9 Further Reading 162 6.10 History 162 Proximal Policy Optimization (PPO) 165 7.1 Surrogate Objective 165 7.1.1 Performance Collapse 166 7.1.2 Modifying the Objective 168 7.2 7.3 Proximal Policy Optimization (PPO) 174 PPO Algorithm 177 7.4 Implementing PPO 179 7.5 7.6 7.4.1 Calculating the PPO Policy Loss 179 7.4.2 PPO Training Loop 180 Training a PPO Agent 182 7.5.1 PPO on Pong 182 7.5.2 PPO on BipedaiWalker 185 Experimental Results 188 7.6.1 Experiment: The Effect of A of GAE 188 7.6.2 Experiment: The Effect of Clipping Variable ε 190 7.7 Summary 192 7.8 Further Reading 192 xiii xiv Contents 8 Parallelization Methods 195 8.1 Synchronous Parallelization 196 8.2 Asynchronous Parallelization 197 8.2.1 9 III Hogwild! 198 8.3 Training an A3C Agent 200 8.4 Summary 203 8.5 Further Reading 204 Algorithm Summary 205 Practical Details 207 10 Getting Deep RL to Work 209 10.1 Software Engineering Practices 209 10.1.1 10.2 Unit Tests 210 10.1.2 Code Quality 215 10.1.3 Git Workflow 216 Debugging Tips 218 10.2.1 Signs of Life 219 10.2.2 Policy Gradient Diagnoses 219 10.2.3 Data Diagnoses 220 10.2.4 Preprocessor 222 10.2.5 Memory 222 10.2.6 Algorithmic Functions 222 10.2.7 Neural Networks 222 10.2.8 Algorithm Simplification 225 10.2.9 Problem Simplification 226 10.2.10 Hyperparameters 226 10.2.11 Lab Workflow 226 10.3 Atari Tricks 228 10.4 Deep RL Almanac 231 10.4.1 Hyperparameter Tables 231 Contents 10.4.2 10.5 Algorithm Performance Comparison 234 Summary 238 11 SLM Lab 239 11.1 Algorithms Implemented in SLM Lab 239 11.2 Spec File 241 11.2.1 11.3 Running SLM Lab 246 11.3.1 11.4 SLM Lab Commands 246 Analyzing Experiment Results 247 11.4.1 11.5 Search Spec Syntax 243 Overview of the Experiment Data 247 Summary 249 12 Network Architectures 251 12.1 12.2 12.3 Types of Neural Networks 251 12.1.1 Multilayer Perceptrons (MLPs) 252 12.1.2 Convolutional Neural Networks (CNNs) 253 12.1.3 Recurrent Neural Networks (RNNs) 255 Guidelines for Choosing a Network Family 256 12.2.1 MDPsvs. POMDPs 256 12.2.2 Choosing Networks for Environments 259 The Net API 262 12.3.1 Input and Output Layer Shape Inference 264 12.3.2 Automatic Network Construction 266 12.3.3 Training Step 269 12.3.4 Exposure of Underlying Methods 270 12.4 Summary 271 12.5 Further Reading 271 xv xvi Contents 13 Hardware 273 13.1 Computer 273 13.2 Data Types 278 13.3 Optimizing Data Types in RL 280 13.4 Choosing Hardware 285 13.5 Summary 285 Environment Design 287 14 States 289 14.1 Examples of States 289 14.2 State Completeness 296 14.3 State Complexity 297 14.4 State Information Loss 301 14.5 14.6 14.4.1 Image Grayscaling 301 14.4.2 Discretization 302 14.4.3 Hash Conflict 303 14.4.4 Metainformation Loss 303 Preprocessing 306 14.5.1 Standardization 307 14.5.2 Image Preprocessing 308 14.5.3 Temporal Preprocessing 310 Summary 313 15 Actions 315 15.1 Examples of Actions 315 15.2 Action Completeness 318 15.3 Action Complexity 319 15.4 Summary 323 15.5 Further Reading: Action Design in Everyday Things 324 16 Rewards 327 16.1 The Role of Rewards 327 16.2 Reward Design Guidelines 328 16.3 Summary 332 Contents 17 Transition Function 333 17.1 Feasibility Checks 333 17.2 Reality Check 335 17.3 Summary 337 Epilogue 338 A Deep Reinforcement Learning Timeline 343 В Example Environments 34S B.l B.2 Discrete Environments 346 B.1.1 CartPole-vO 346 B.1.2 MountainCar-vO 347 B.1.3 LunarLander-v2 347 B.1.4 PongNoFrameskip-v4 348 B.1.5 BreakoutNoFrameskip֊v4 349 Continuous Environments Pendulum-vO 350 B.2.2 BipedalWalker-v2 350 References 353 Index 363 350 B.2.1 xvii THE CONTEMPORARY INTRODUCTION TO DEEP REINFORCEMENT LEARNING THAT COMBINES THEORY AND PRACTICE Deep reinforcement learning (deep RL) combines deep learning and reinforcement learning, in which artificial agents learn to solve sequential decision-making problems. In the past decade deep RL has achieved remarkable results on a range of problems, from single and multiplayer games—such as Go, Atari games, and Dota 2—to robotics. • Understand each key aspect of a deep RL problem • Explore policy- and value-based algorithms, including REINFORCE, SARSA, DQN, Double DQN, and Prioritized Experience Replay (PER) • Delve into combined algorithms, including Actor-Critic and Proximal Policy Optimization (PPO) This guide is ideal for both computer science students and software engineers who are familiar with basic machine learning concepts and have a working understanding of Python. • Understand how algo rithms can be parallelized synchronously and asynchronously LAURA GRAESSER is a research software engineer working in robotics at Google. She holds a master's degree in computer science from New York University, where she specialized in machine learning. WAH LOON KENG is an Al engineer at Machine Zone, where he applies deep reinforcement learning to industrial problems. He has a background in both theoretical physics and computer science. Together, they've developed two deep RL software libraries and presented many talks and tutorials on the subject. • Run algorithms in SLM Lab and learn the practical implementation details for getting deep RLto work • Explore algorithm benchmark results with tuned hyperparameters • Understand how deep RL environments are designed Foundations of Deep Reinforcement Learning is an introduction to deep RLthat uniquely combines both theory and implementation. It starts with intuition, then carefully explains the theory of deep RL algorithms, discusses implementations in its companion software library SLM Lab, and finishes with the practical details of getting deep RL to work.
any_adam_object	1
any_adam_object_boolean	1
author	Graesser, Laura Keng, Wah Loon
author_GND	(DE-588)1216177937 (DE-588)1205939903
author_facet	Graesser, Laura Keng, Wah Loon
author_role	aut aut
author_sort	Graesser, Laura
author_variant	l g lg w l k wl wlk
building	Verbundindex
bvnumber	BV046858587
callnumber-first	Q - Science
callnumber-label	Q325
callnumber-raw	Q325.6
callnumber-search	Q325.6
callnumber-sort	Q 3325.6
callnumber-subject	Q - General Science
classification_rvk	ST 300 ST 302 ST 250
ctrlnum	(OCoLC)1191892931 (DE-599)KXP1700578286
dewey-full	006.31
dewey-hundreds	000 - Computer science, information, general works
dewey-ones	006 - Special computer methods
dewey-raw	006.31
dewey-search	006.31
dewey-sort	16.31
dewey-tens	000 - Computer science, information, general works
discipline	Informatik
discipline_str_mv	Informatik
format	Book
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 c 4500</leader><controlfield tag="001">BV046858587</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20220330</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">200819s2021 xxua\|\|\| \|\|\|\| 00\|\|\| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780135172384</subfield><subfield code="c">paperback</subfield><subfield code="9">978-0-13-517238-4</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">0135172381</subfield><subfield code="c">paperback</subfield><subfield code="9">0-13-517238-1</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1191892931</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)KXP1700578286</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxu</subfield><subfield code="c">XD-US</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-573</subfield><subfield code="a">DE-83</subfield><subfield code="a">DE-29T</subfield><subfield code="a">DE-384</subfield><subfield code="a">DE-355</subfield><subfield code="a">DE-858</subfield><subfield code="a">DE-863</subfield><subfield code="a">DE-898</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">Q325.6</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.31</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 300</subfield><subfield code="0">(DE-625)143650:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 302</subfield><subfield code="0">(DE-625)143652:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 250</subfield><subfield code="0">(DE-625)143626:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Graesser, Laura</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1216177937</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Foundations of deep reinforcement learning</subfield><subfield code="b">theory and practice in Python</subfield><subfield code="c">Laura Graesser, Wah Loon Keng</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Boston</subfield><subfield code="b">Addison-Wesley</subfield><subfield code="c">2021</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">© 2020</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xxvii, 379 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield><subfield code="c">23 cm</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Addison Wesley data & analytics series</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Literaturverzeichnis Seite 353-362</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">The Contemporary Introduction to Deep Reinforcement Learning that Combines Theory and Practice Deep reinforcement learning (deep RL) combines deep learning and reinforcement learning, in which artificial agents learn to solve sequential decision-making problems. In the past decade deep RL has achieved remarkable results on a range of problems, from single and multiplayer games--such as Go, Atari games, and DotA 2--to robotics. Foundations of Deep Reinforcement Learning is an introduction to deep RL that uniquely combines both theory and implementation. It starts with intuition, then carefully explains the theory of deep RL algorithms, discusses implementations in its companion software library SLM Lab, and finishes with the practical details of getting deep RL to work. Understand each key aspect of a deep RL problem Explore policy- and value-based algorithms, including REINFORCE, SARSA, DQN, Double DQN, and Prioritized Experience Replay (PER) Delve into combined algorithms, including Actor-Critic and Proximal Policy Optimization (PPO) Understand how algorithms can be parallelized synchronously and asynchronously Run algorithms in SLM Lab and learn the practical implementation details for getting deep RL to work Explore algorithm benchmark results with tuned hyperparameters Understand how deep RL environments are designed This guide is ideal for both computer science students and software engineers who are familiar with basic machine learning concepts and have a working understanding of Python. Register your book for convenient access to downloads, updates, and/or corrections as they become available. See inside book for details</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4033447-8</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Python</subfield><subfield code="g">Programmiersprache</subfield><subfield code="0">(DE-588)4434275-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Operante Konditionierung</subfield><subfield code="0">(DE-588)4172613-3</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Reinforcement learning</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Python (Computer program language)</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Python (Computer program language)</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Operante Konditionierung</subfield><subfield code="0">(DE-588)4172613-3</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4033447-8</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="689" ind1="1" ind2="0"><subfield code="a">Python</subfield><subfield code="g">Programmiersprache</subfield><subfield code="0">(DE-588)4434275-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2="1"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2="2"><subfield code="a">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4033447-8</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Keng, Wah Loon</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1205939903</subfield><subfield code="4">aut</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Regensburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032267284&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Regensburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032267284&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Klappentext</subfield></datafield></record></collection>
id	DE-604.BV046858587
illustrated	Illustrated
index_date	2024-07-03T15:12:09Z
indexdate	2025-02-20T07:06:44Z
institution	BVB
isbn	9780135172384 0135172381
language	English
oai_aleph_id	oai:aleph.bib-bvb.de:BVB01-032267284
oclc_num	1191892931
open_access_boolean
owner	DE-573 DE-83 DE-29T DE-384 DE-355 DE-BY-UBR DE-858 DE-863 DE-BY-FWS DE-898 DE-BY-UBR
owner_facet	DE-573 DE-83 DE-29T DE-384 DE-355 DE-BY-UBR DE-858 DE-863 DE-BY-FWS DE-898 DE-BY-UBR
physical	xxvii, 379 Seiten Illustrationen, Diagramme 23 cm
publishDate	2021
publishDateSearch	2021
publishDateSort	2021
publisher	Addison-Wesley
record_format	marc
series2	Addison Wesley data & analytics series
spellingShingle	Graesser, Laura Keng, Wah Loon Foundations of deep reinforcement learning theory and practice in Python Künstliche Intelligenz (DE-588)4033447-8 gnd Python Programmiersprache (DE-588)4434275-5 gnd Operante Konditionierung (DE-588)4172613-3 gnd Maschinelles Lernen (DE-588)4193754-5 gnd
subject_GND	(DE-588)4033447-8 (DE-588)4434275-5 (DE-588)4172613-3 (DE-588)4193754-5
title	Foundations of deep reinforcement learning theory and practice in Python
title_auth	Foundations of deep reinforcement learning theory and practice in Python
title_exact_search	Foundations of deep reinforcement learning theory and practice in Python
title_exact_search_txtP	Foundations of deep reinforcement learning theory and practice in Python
title_full	Foundations of deep reinforcement learning theory and practice in Python Laura Graesser, Wah Loon Keng
title_fullStr	Foundations of deep reinforcement learning theory and practice in Python Laura Graesser, Wah Loon Keng
title_full_unstemmed	Foundations of deep reinforcement learning theory and practice in Python Laura Graesser, Wah Loon Keng
title_short	Foundations of deep reinforcement learning
title_sort	foundations of deep reinforcement learning theory and practice in python
title_sub	theory and practice in Python
topic	Künstliche Intelligenz (DE-588)4033447-8 gnd Python Programmiersprache (DE-588)4434275-5 gnd Operante Konditionierung (DE-588)4172613-3 gnd Maschinelles Lernen (DE-588)4193754-5 gnd
topic_facet	Künstliche Intelligenz Python Programmiersprache Operante Konditionierung Maschinelles Lernen
url	http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032267284&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032267284&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA
work_keys_str_mv	AT graesserlaura foundationsofdeepreinforcementlearningtheoryandpracticeinpython AT kengwahloon foundationsofdeepreinforcementlearningtheoryandpracticeinpython

Verfügbarkeit

Inhaltsverzeichnis

THWS Würzburg Zentralbibliothek Lesesaal

Bestandesangaben von THWS Würzburg Zentralbibliothek Lesesaal
Signatur:	1000 ST 302 G735
Exemplar 1	ausleihbar Checked out – Rückgabe bis: 30.04.2025 Vormerken

MARC

Datensatz im Suchindex

THWS Würzburg Zentralbibliothek Lesesaal

Ähnliche Einträge