Reinforcement learning algorithms with Python: learn, understand, and develop smart algorithms for addressing AI challenges
With this book, you will understand the core concepts and techniques of reinforcement learning. You will take a look into each RL algorithm and will develop your own self-learning algorithms and models. You will optimize the algorithms for better precision, use high-speed actions and lower the risk...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Birmingham ; Mumbai
Packt Publishing
October 2019
|
Schlagworte: | |
Zusammenfassung: | With this book, you will understand the core concepts and techniques of reinforcement learning. You will take a look into each RL algorithm and will develop your own self-learning algorithms and models. You will optimize the algorithms for better precision, use high-speed actions and lower the risk of anomalies in your applications |
Beschreibung: | Implementing REINFORCE with baseline |
Beschreibung: | vii, 351 Seiten Illustrationen, Diagramme |
ISBN: | 9781789131116 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV046768925 | ||
003 | DE-604 | ||
007 | t | ||
008 | 200617s2019 a||| |||| 00||| eng d | ||
020 | |a 9781789131116 |9 978-1-78913-111-6 | ||
035 | |a (OCoLC)1164646408 | ||
035 | |a (DE-599)BVBBV046768925 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-573 |a DE-858 |a DE-898 | ||
084 | |a ST 250 |0 (DE-625)143626: |2 rvk | ||
084 | |a ST 300 |0 (DE-625)143650: |2 rvk | ||
100 | 1 | |a Lonza, Andrea |e Verfasser |4 aut | |
245 | 1 | 0 | |a Reinforcement learning algorithms with Python |b learn, understand, and develop smart algorithms for addressing AI challenges |c Andrea Lonza |
264 | 1 | |a Birmingham ; Mumbai |b Packt Publishing |c October 2019 | |
300 | |a vii, 351 Seiten |b Illustrationen, Diagramme | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
500 | |a Implementing REINFORCE with baseline | ||
505 | 8 | |a Cover; Title Page; Copyright and Credits; Dedication; About Packt; Contributors; Table of Contents; Preface; Section 1: Algorithms and Environments; Chapter 1: The Landscape of Reinforcement Learning; An introduction to RL; Comparing RL and supervised learning; History of RL; Deep RL; Elements of RL; Policy; The value function; Reward; Model; Applications of RL; Games; Robotics and Industry 4.0; Machine learning; Economics and finance; Healthcare; Intelligent transportation systems; Energy optimization and smart grid; Summary; Questions; Further reading | |
505 | 8 | |a Chapter 2: Implementing RL Cycle and OpenAI GymSetting up the environment; Installing OpenAI Gym; Installing Roboschool; OpenAI Gym and RL cycles; Developing an RL cycle; Getting used to spaces; Development of ML models using TensorFlow; Tensor; Constant; Placeholder; Variable; Creating a graph; Simple linear regression example; Introducing TensorBoard; Types of RL environments; Why different environments?; Open source environments; Summary; Questions; Further reading; Chapter 3: Solving Problems with Dynamic Programming; MDP; Policy; Return; Value functions; Bellman equation | |
505 | 8 | |a Categorizing RL algorithmsModel-free algorithms; Value-based algorithms; Policy gradient algorithms; Actor-Critic algorithms; Hybrid algorithms; Model-based RL; Algorithm diversity; Dynamic programming; Policy evaluation and policy improvement; Policy iteration; Policy iteration applied to FrozenLake; Value iteration; Value iteration applied to FrozenLake; Summary; Questions; Further reading; Section 2: Model-Free RL Algorithms; Chapter 4: Q-Learning and SARSA Applications; Learning without a model; User experience; Policy evaluation; The exploration problem; Why explore?; How to explore | |
505 | 8 | |a TD learningTD update; Policy improvement; Comparing Monte Carlo and TD; SARSA; The algorithm; Applying SARSA to Taxi-v2; Q-learning; Theory; The algorithm; Applying Q-learning to Taxi-v2; Comparing SARSA and Q-learning; Summary; Questions; Chapter 5: Deep Q-Network; Deep neural networks and Q-learning; Function approximation; Q-learning with neural networks; Deep Q-learning instabilities; DQN; The solution; Replay memory; The target network; The DQN algorithm; The loss function; Pseudocode; Model architecture; DQN applied to Pong; Atari games; Preprocessing; DQN implementation; DNNs | |
505 | 8 | |a The experienced bufferThe computational graph and training loop; Results; DQN variations; Double DQN; DDQN implementation; Results; Dueling DQN; Dueling DQN implementation; Results; N-step DQN; Implementation; Results; Summary; Questions; Further reading; Chapter 6: Learning Stochastic and PG Optimization; Policy gradient methods; The gradient of the policy; Policy gradient theorem; Computing the gradient; The policy; On-policy PG; Understanding the REINFORCE algorithm; Implementing REINFORCE; Landing a spacecraft using REINFORCE; Analyzing the results; REINFORCE with baseline | |
520 | 3 | |a With this book, you will understand the core concepts and techniques of reinforcement learning. You will take a look into each RL algorithm and will develop your own self-learning algorithms and models. You will optimize the algorithms for better precision, use high-speed actions and lower the risk of anomalies in your applications | |
650 | 0 | 7 | |a Python |g Programmiersprache |0 (DE-588)4434275-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Operante Konditionierung |0 (DE-588)4172613-3 |2 gnd |9 rswk-swf |
653 | 0 | |a Computer algorithms | |
653 | 0 | |a Python (Computer program language) | |
653 | 6 | |a Electronic books | |
653 | 6 | |a Electronic books | |
689 | 0 | 0 | |a Python |g Programmiersprache |0 (DE-588)4434275-5 |D s |
689 | 0 | 1 | |a Operante Konditionierung |0 (DE-588)4172613-3 |D s |
689 | 0 | |5 DE-604 | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |z 978-1-78913-970-9 |
Datensatz im Suchindex
_version_ | 1805084086672293888 |
---|---|
adam_text | |
adam_txt | |
any_adam_object | |
any_adam_object_boolean | |
author | Lonza, Andrea |
author_facet | Lonza, Andrea |
author_role | aut |
author_sort | Lonza, Andrea |
author_variant | a l al |
building | Verbundindex |
bvnumber | BV046768925 |
classification_rvk | ST 250 ST 300 |
contents | Cover; Title Page; Copyright and Credits; Dedication; About Packt; Contributors; Table of Contents; Preface; Section 1: Algorithms and Environments; Chapter 1: The Landscape of Reinforcement Learning; An introduction to RL; Comparing RL and supervised learning; History of RL; Deep RL; Elements of RL; Policy; The value function; Reward; Model; Applications of RL; Games; Robotics and Industry 4.0; Machine learning; Economics and finance; Healthcare; Intelligent transportation systems; Energy optimization and smart grid; Summary; Questions; Further reading Chapter 2: Implementing RL Cycle and OpenAI GymSetting up the environment; Installing OpenAI Gym; Installing Roboschool; OpenAI Gym and RL cycles; Developing an RL cycle; Getting used to spaces; Development of ML models using TensorFlow; Tensor; Constant; Placeholder; Variable; Creating a graph; Simple linear regression example; Introducing TensorBoard; Types of RL environments; Why different environments?; Open source environments; Summary; Questions; Further reading; Chapter 3: Solving Problems with Dynamic Programming; MDP; Policy; Return; Value functions; Bellman equation Categorizing RL algorithmsModel-free algorithms; Value-based algorithms; Policy gradient algorithms; Actor-Critic algorithms; Hybrid algorithms; Model-based RL; Algorithm diversity; Dynamic programming; Policy evaluation and policy improvement; Policy iteration; Policy iteration applied to FrozenLake; Value iteration; Value iteration applied to FrozenLake; Summary; Questions; Further reading; Section 2: Model-Free RL Algorithms; Chapter 4: Q-Learning and SARSA Applications; Learning without a model; User experience; Policy evaluation; The exploration problem; Why explore?; How to explore TD learningTD update; Policy improvement; Comparing Monte Carlo and TD; SARSA; The algorithm; Applying SARSA to Taxi-v2; Q-learning; Theory; The algorithm; Applying Q-learning to Taxi-v2; Comparing SARSA and Q-learning; Summary; Questions; Chapter 5: Deep Q-Network; Deep neural networks and Q-learning; Function approximation; Q-learning with neural networks; Deep Q-learning instabilities; DQN; The solution; Replay memory; The target network; The DQN algorithm; The loss function; Pseudocode; Model architecture; DQN applied to Pong; Atari games; Preprocessing; DQN implementation; DNNs The experienced bufferThe computational graph and training loop; Results; DQN variations; Double DQN; DDQN implementation; Results; Dueling DQN; Dueling DQN implementation; Results; N-step DQN; Implementation; Results; Summary; Questions; Further reading; Chapter 6: Learning Stochastic and PG Optimization; Policy gradient methods; The gradient of the policy; Policy gradient theorem; Computing the gradient; The policy; On-policy PG; Understanding the REINFORCE algorithm; Implementing REINFORCE; Landing a spacecraft using REINFORCE; Analyzing the results; REINFORCE with baseline |
ctrlnum | (OCoLC)1164646408 (DE-599)BVBBV046768925 |
discipline | Informatik |
discipline_str_mv | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 c 4500</leader><controlfield tag="001">BV046768925</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">200617s2019 a||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781789131116</subfield><subfield code="9">978-1-78913-111-6</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1164646408</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV046768925</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-573</subfield><subfield code="a">DE-858</subfield><subfield code="a">DE-898</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 250</subfield><subfield code="0">(DE-625)143626:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 300</subfield><subfield code="0">(DE-625)143650:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Lonza, Andrea</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Reinforcement learning algorithms with Python</subfield><subfield code="b">learn, understand, and develop smart algorithms for addressing AI challenges</subfield><subfield code="c">Andrea Lonza</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Birmingham ; Mumbai</subfield><subfield code="b">Packt Publishing</subfield><subfield code="c">October 2019</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">vii, 351 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Implementing REINFORCE with baseline</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">Cover; Title Page; Copyright and Credits; Dedication; About Packt; Contributors; Table of Contents; Preface; Section 1: Algorithms and Environments; Chapter 1: The Landscape of Reinforcement Learning; An introduction to RL; Comparing RL and supervised learning; History of RL; Deep RL; Elements of RL; Policy; The value function; Reward; Model; Applications of RL; Games; Robotics and Industry 4.0; Machine learning; Economics and finance; Healthcare; Intelligent transportation systems; Energy optimization and smart grid; Summary; Questions; Further reading</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">Chapter 2: Implementing RL Cycle and OpenAI GymSetting up the environment; Installing OpenAI Gym; Installing Roboschool; OpenAI Gym and RL cycles; Developing an RL cycle; Getting used to spaces; Development of ML models using TensorFlow; Tensor; Constant; Placeholder; Variable; Creating a graph; Simple linear regression example; Introducing TensorBoard; Types of RL environments; Why different environments?; Open source environments; Summary; Questions; Further reading; Chapter 3: Solving Problems with Dynamic Programming; MDP; Policy; Return; Value functions; Bellman equation</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">Categorizing RL algorithmsModel-free algorithms; Value-based algorithms; Policy gradient algorithms; Actor-Critic algorithms; Hybrid algorithms; Model-based RL; Algorithm diversity; Dynamic programming; Policy evaluation and policy improvement; Policy iteration; Policy iteration applied to FrozenLake; Value iteration; Value iteration applied to FrozenLake; Summary; Questions; Further reading; Section 2: Model-Free RL Algorithms; Chapter 4: Q-Learning and SARSA Applications; Learning without a model; User experience; Policy evaluation; The exploration problem; Why explore?; How to explore</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">TD learningTD update; Policy improvement; Comparing Monte Carlo and TD; SARSA; The algorithm; Applying SARSA to Taxi-v2; Q-learning; Theory; The algorithm; Applying Q-learning to Taxi-v2; Comparing SARSA and Q-learning; Summary; Questions; Chapter 5: Deep Q-Network; Deep neural networks and Q-learning; Function approximation; Q-learning with neural networks; Deep Q-learning instabilities; DQN; The solution; Replay memory; The target network; The DQN algorithm; The loss function; Pseudocode; Model architecture; DQN applied to Pong; Atari games; Preprocessing; DQN implementation; DNNs</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">The experienced bufferThe computational graph and training loop; Results; DQN variations; Double DQN; DDQN implementation; Results; Dueling DQN; Dueling DQN implementation; Results; N-step DQN; Implementation; Results; Summary; Questions; Further reading; Chapter 6: Learning Stochastic and PG Optimization; Policy gradient methods; The gradient of the policy; Policy gradient theorem; Computing the gradient; The policy; On-policy PG; Understanding the REINFORCE algorithm; Implementing REINFORCE; Landing a spacecraft using REINFORCE; Analyzing the results; REINFORCE with baseline</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">With this book, you will understand the core concepts and techniques of reinforcement learning. You will take a look into each RL algorithm and will develop your own self-learning algorithms and models. You will optimize the algorithms for better precision, use high-speed actions and lower the risk of anomalies in your applications</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Python</subfield><subfield code="g">Programmiersprache</subfield><subfield code="0">(DE-588)4434275-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Operante Konditionierung</subfield><subfield code="0">(DE-588)4172613-3</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Computer algorithms</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Python (Computer program language)</subfield></datafield><datafield tag="653" ind1=" " ind2="6"><subfield code="a">Electronic books</subfield></datafield><datafield tag="653" ind1=" " ind2="6"><subfield code="a">Electronic books</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Python</subfield><subfield code="g">Programmiersprache</subfield><subfield code="0">(DE-588)4434275-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Operante Konditionierung</subfield><subfield code="0">(DE-588)4172613-3</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">978-1-78913-970-9</subfield></datafield></record></collection> |
id | DE-604.BV046768925 |
illustrated | Illustrated |
index_date | 2024-07-03T14:46:12Z |
indexdate | 2024-07-20T07:58:52Z |
institution | BVB |
isbn | 9781789131116 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-032178365 |
oclc_num | 1164646408 |
open_access_boolean | |
owner | DE-573 DE-858 DE-898 DE-BY-UBR |
owner_facet | DE-573 DE-858 DE-898 DE-BY-UBR |
physical | vii, 351 Seiten Illustrationen, Diagramme |
publishDate | 2019 |
publishDateSearch | 2019 |
publishDateSort | 2019 |
publisher | Packt Publishing |
record_format | marc |
spelling | Lonza, Andrea Verfasser aut Reinforcement learning algorithms with Python learn, understand, and develop smart algorithms for addressing AI challenges Andrea Lonza Birmingham ; Mumbai Packt Publishing October 2019 vii, 351 Seiten Illustrationen, Diagramme txt rdacontent n rdamedia nc rdacarrier Implementing REINFORCE with baseline Cover; Title Page; Copyright and Credits; Dedication; About Packt; Contributors; Table of Contents; Preface; Section 1: Algorithms and Environments; Chapter 1: The Landscape of Reinforcement Learning; An introduction to RL; Comparing RL and supervised learning; History of RL; Deep RL; Elements of RL; Policy; The value function; Reward; Model; Applications of RL; Games; Robotics and Industry 4.0; Machine learning; Economics and finance; Healthcare; Intelligent transportation systems; Energy optimization and smart grid; Summary; Questions; Further reading Chapter 2: Implementing RL Cycle and OpenAI GymSetting up the environment; Installing OpenAI Gym; Installing Roboschool; OpenAI Gym and RL cycles; Developing an RL cycle; Getting used to spaces; Development of ML models using TensorFlow; Tensor; Constant; Placeholder; Variable; Creating a graph; Simple linear regression example; Introducing TensorBoard; Types of RL environments; Why different environments?; Open source environments; Summary; Questions; Further reading; Chapter 3: Solving Problems with Dynamic Programming; MDP; Policy; Return; Value functions; Bellman equation Categorizing RL algorithmsModel-free algorithms; Value-based algorithms; Policy gradient algorithms; Actor-Critic algorithms; Hybrid algorithms; Model-based RL; Algorithm diversity; Dynamic programming; Policy evaluation and policy improvement; Policy iteration; Policy iteration applied to FrozenLake; Value iteration; Value iteration applied to FrozenLake; Summary; Questions; Further reading; Section 2: Model-Free RL Algorithms; Chapter 4: Q-Learning and SARSA Applications; Learning without a model; User experience; Policy evaluation; The exploration problem; Why explore?; How to explore TD learningTD update; Policy improvement; Comparing Monte Carlo and TD; SARSA; The algorithm; Applying SARSA to Taxi-v2; Q-learning; Theory; The algorithm; Applying Q-learning to Taxi-v2; Comparing SARSA and Q-learning; Summary; Questions; Chapter 5: Deep Q-Network; Deep neural networks and Q-learning; Function approximation; Q-learning with neural networks; Deep Q-learning instabilities; DQN; The solution; Replay memory; The target network; The DQN algorithm; The loss function; Pseudocode; Model architecture; DQN applied to Pong; Atari games; Preprocessing; DQN implementation; DNNs The experienced bufferThe computational graph and training loop; Results; DQN variations; Double DQN; DDQN implementation; Results; Dueling DQN; Dueling DQN implementation; Results; N-step DQN; Implementation; Results; Summary; Questions; Further reading; Chapter 6: Learning Stochastic and PG Optimization; Policy gradient methods; The gradient of the policy; Policy gradient theorem; Computing the gradient; The policy; On-policy PG; Understanding the REINFORCE algorithm; Implementing REINFORCE; Landing a spacecraft using REINFORCE; Analyzing the results; REINFORCE with baseline With this book, you will understand the core concepts and techniques of reinforcement learning. You will take a look into each RL algorithm and will develop your own self-learning algorithms and models. You will optimize the algorithms for better precision, use high-speed actions and lower the risk of anomalies in your applications Python Programmiersprache (DE-588)4434275-5 gnd rswk-swf Operante Konditionierung (DE-588)4172613-3 gnd rswk-swf Computer algorithms Python (Computer program language) Electronic books Python Programmiersprache (DE-588)4434275-5 s Operante Konditionierung (DE-588)4172613-3 s DE-604 Erscheint auch als Online-Ausgabe 978-1-78913-970-9 |
spellingShingle | Lonza, Andrea Reinforcement learning algorithms with Python learn, understand, and develop smart algorithms for addressing AI challenges Cover; Title Page; Copyright and Credits; Dedication; About Packt; Contributors; Table of Contents; Preface; Section 1: Algorithms and Environments; Chapter 1: The Landscape of Reinforcement Learning; An introduction to RL; Comparing RL and supervised learning; History of RL; Deep RL; Elements of RL; Policy; The value function; Reward; Model; Applications of RL; Games; Robotics and Industry 4.0; Machine learning; Economics and finance; Healthcare; Intelligent transportation systems; Energy optimization and smart grid; Summary; Questions; Further reading Chapter 2: Implementing RL Cycle and OpenAI GymSetting up the environment; Installing OpenAI Gym; Installing Roboschool; OpenAI Gym and RL cycles; Developing an RL cycle; Getting used to spaces; Development of ML models using TensorFlow; Tensor; Constant; Placeholder; Variable; Creating a graph; Simple linear regression example; Introducing TensorBoard; Types of RL environments; Why different environments?; Open source environments; Summary; Questions; Further reading; Chapter 3: Solving Problems with Dynamic Programming; MDP; Policy; Return; Value functions; Bellman equation Categorizing RL algorithmsModel-free algorithms; Value-based algorithms; Policy gradient algorithms; Actor-Critic algorithms; Hybrid algorithms; Model-based RL; Algorithm diversity; Dynamic programming; Policy evaluation and policy improvement; Policy iteration; Policy iteration applied to FrozenLake; Value iteration; Value iteration applied to FrozenLake; Summary; Questions; Further reading; Section 2: Model-Free RL Algorithms; Chapter 4: Q-Learning and SARSA Applications; Learning without a model; User experience; Policy evaluation; The exploration problem; Why explore?; How to explore TD learningTD update; Policy improvement; Comparing Monte Carlo and TD; SARSA; The algorithm; Applying SARSA to Taxi-v2; Q-learning; Theory; The algorithm; Applying Q-learning to Taxi-v2; Comparing SARSA and Q-learning; Summary; Questions; Chapter 5: Deep Q-Network; Deep neural networks and Q-learning; Function approximation; Q-learning with neural networks; Deep Q-learning instabilities; DQN; The solution; Replay memory; The target network; The DQN algorithm; The loss function; Pseudocode; Model architecture; DQN applied to Pong; Atari games; Preprocessing; DQN implementation; DNNs The experienced bufferThe computational graph and training loop; Results; DQN variations; Double DQN; DDQN implementation; Results; Dueling DQN; Dueling DQN implementation; Results; N-step DQN; Implementation; Results; Summary; Questions; Further reading; Chapter 6: Learning Stochastic and PG Optimization; Policy gradient methods; The gradient of the policy; Policy gradient theorem; Computing the gradient; The policy; On-policy PG; Understanding the REINFORCE algorithm; Implementing REINFORCE; Landing a spacecraft using REINFORCE; Analyzing the results; REINFORCE with baseline Python Programmiersprache (DE-588)4434275-5 gnd Operante Konditionierung (DE-588)4172613-3 gnd |
subject_GND | (DE-588)4434275-5 (DE-588)4172613-3 |
title | Reinforcement learning algorithms with Python learn, understand, and develop smart algorithms for addressing AI challenges |
title_auth | Reinforcement learning algorithms with Python learn, understand, and develop smart algorithms for addressing AI challenges |
title_exact_search | Reinforcement learning algorithms with Python learn, understand, and develop smart algorithms for addressing AI challenges |
title_exact_search_txtP | Reinforcement learning algorithms with Python learn, understand, and develop smart algorithms for addressing AI challenges |
title_full | Reinforcement learning algorithms with Python learn, understand, and develop smart algorithms for addressing AI challenges Andrea Lonza |
title_fullStr | Reinforcement learning algorithms with Python learn, understand, and develop smart algorithms for addressing AI challenges Andrea Lonza |
title_full_unstemmed | Reinforcement learning algorithms with Python learn, understand, and develop smart algorithms for addressing AI challenges Andrea Lonza |
title_short | Reinforcement learning algorithms with Python |
title_sort | reinforcement learning algorithms with python learn understand and develop smart algorithms for addressing ai challenges |
title_sub | learn, understand, and develop smart algorithms for addressing AI challenges |
topic | Python Programmiersprache (DE-588)4434275-5 gnd Operante Konditionierung (DE-588)4172613-3 gnd |
topic_facet | Python Programmiersprache Operante Konditionierung |
work_keys_str_mv | AT lonzaandrea reinforcementlearningalgorithmswithpythonlearnunderstandanddevelopsmartalgorithmsforaddressingaichallenges |