Reinforcement learning: industrial applications of intelligent agents
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Beijing ; Boston ; Farnham ; Sebastopol ; Tokyo
O'Reilly
2020
|
Ausgabe: | First edition |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | xxiii, 379 Seiten Illustrationen, Diagramme |
ISBN: | 9781098114831 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV047160991 | ||
003 | DE-604 | ||
005 | 20240807 | ||
007 | t | ||
008 | 210224s2020 a||| |||| 00||| eng d | ||
020 | |a 9781098114831 |9 978-1-0981-1483-1 | ||
035 | |a (OCoLC)1245340739 | ||
035 | |a (DE-599)BVBBV047160991 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-384 |a DE-703 |a DE-898 | ||
084 | |a ST 300 |0 (DE-625)143650: |2 rvk | ||
084 | |a QH 500 |0 (DE-625)141607: |2 rvk | ||
100 | 1 | |a Winder, Phil |d ca. 20./21. Jh. |e Verfasser |0 (DE-588)1230881646 |4 aut | |
245 | 1 | 0 | |a Reinforcement learning |b industrial applications of intelligent agents |c Phil Winder, Ph.D. |
250 | |a First edition | ||
264 | 1 | |a Beijing ; Boston ; Farnham ; Sebastopol ; Tokyo |b O'Reilly |c 2020 | |
264 | 4 | |c © 2021 | |
300 | |a xxiii, 379 Seiten |b Illustrationen, Diagramme | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
650 | 0 | 7 | |a Bestärkendes Lernen |g Künstliche Intelligenz |0 (DE-588)4825546-4 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Bestärkendes Lernen |g Künstliche Intelligenz |0 (DE-588)4825546-4 |D s |
689 | 0 | |5 DE-604 | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |z 978-1-4920-7236-2 |
856 | 4 | 2 | |m Digitalisierung UB Augsburg - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032566602&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-032566602 |
Datensatz im Suchindex
_version_ | 1806775860802355200 |
---|---|
adam_text |
Table of Contents Preface. xv 1. Why Reinforcement Learning?. Why Now? Machine Learning Reinforcement Learning When Should You Use RL? RL Applications Taxonomy of RL Approaches Model-Free or Model-Based How Agents Use and Update Their Strategy Discrete or Continuous Actions Optimization Methods Policy Evaluation and Improvement Fundamental Concepts in Reinforcement Learning The First RL Algorithm Is RL the Same as ML? Reward and Feedback Reinforcement Learning as a Discipline Summary Further Reading 1 2 3 4 5 7 8 8 9 10 11 11 12 12 15 16 18 20 20 2. Markov Decision Processes, Dynamic Programming, and Monte Carlo Methods. 25 Multi-Arm Bandit Testing Reward Engineering Policy Evaluation: The Value Function Policy Improvement: Choosing the Best Action 25 26 26 29 vii
Simulating the Environment Running the Experiment Improving the e-greedy Algorithm Markov Decision Processes Inventory Control Inventory Control Simulation Policies and Value Functions Discounted Rewards Predicting Rewards with the State-Value Function Predicting Rewards with the Action-Value Function Optimal Policies Monte Carlo Policy Generation Value Iteration with Dynamic Programming Implementing Value Iteration Results of Value Iteration Summary Further Reading 31 31 33 35 36 40 42 42 43 47 48 50 52 54 56 57 57 3. Temporal-Difference Learning, Q-Learning, and n-Step Algorithms. 59 Formulation of Temporal-Difference Learning Q-Learning SARSA Q-Learning Versus SARSA Case Study: Automatically Scaling Application Containers to Reduce Cost Industrial Example: Real-Time Bidding in Advertising Defining the MDP Results of the Real-Time Bidding Environments Further Improvements Extensions to Q-Learning Double Q-Learning Delayed Q-Learning Comparing Standard, Double, and Delayed Q-learning Opposition Learning n-Step Algorithms n-Step Algorithms on Grid Environments Eligibility Traces Extensions to Eligibility Traces Watkins’s Q(A) Fuzzy Wipes in Watkins’s Q(A) Speedy Q-Learning Accumulating Versus Replacing Eligibility Traces viii Į Table of Contents 60 62 64 65 68 70 70 71 73 74 74 74 75 75 76 79 80 83 83 84 84 84
85 85 Summary Further Reading 4. Deep Q-Networks. . 87 88 88 89 90 91 92 92 92 93 93 94 98 99 100 102 102 102 103 104 106 107 108 109 109 111 112 Deep Learning Architectures Fundamentals Common Neural Network Architectures Deep Learning Frameworks Deep Reinforcement Learning Deep Q-Learning Experience Replay Q-Network Clones Neural Network Architecture Implementing DQN Example: DQN on the CartPole Environment Case Study: Reducing Energy Usage in Buildings Rainbow DQN Distributional RL Prioritized Experience Replay Noisy Nets Dueling Networks Example: Rainbow DQN on Atari Games Results Discussion Other DQN Improvements Improving Exploration Improving Rewards Learning from Offline Data Summary Further Reading 5. Policy Gradient Methods. . Benefits of Learning a Policy Directly How to Calculate the Gradient of a Policy Policy Gradient Theorem Policy Functions Linear Policies Arbitrary Policies Basic Implementations Monte Carlo (REINFORCE) REINFORCE with Baseline 115 115 116 117 119 120 122 122 122 124 Table of Contents | ix
Gradient Variance Reduction n-Step Actor-Critic and Advantage Actor-Critic (A2C) Eligibility Traces Actor-Critic A Comparison of Basic Policy Gradient Algorithms Industrial Example: Automatically Purchasing Products for Customers The Environment: Gym-Shopping-Cart Expectations Results from the Shopping Cart Environment Summary Further Reading 127 129 134 135 136 137 137 138 142 143 6. Beyond Policy Gradients. 145 Off-Policy Algorithms Importance Sampling Behavior and Target Policies Off-Policy Q-Learning Gradient Temporal-Difference Learning Greedy-GQ Off-Policy Actor-Critics Deterministic Policy Gradients Deterministic Policy Gradients Deep Deterministic Policy Gradients Twin Delayed DDPG Case Study: Recommendations Using Reviews Improvements to DPG Trust Region Methods Kullback-Leibler Divergence Natural Policy Gradients and Trust Region Policy Optimization Proximal Policy Optimization Example: Using Servos for a Real-Life Reacher Experiment Setup RL Algorithm Implementation Increasing the Complexity of the Algorithm Hyperparameter Tuning in a Simulation Resulting Policies Other Policy Gradient Algorithms Retrace(A) Actor-Critic with Experience Replay (ACER) Actor-Critic Using Kronecker-Factored Trust Regions (ACKTR) Emphatic Methods Extensions to Policy Gradient Algorithms x I Table of Contents 145 146 148 149 149 150 151 152 152 154 158 161 163 163 165 167 169 174 175 175 177 178 180 181 182 182 183 183 184
Quantile Regression in Policy Gradient Algorithms Summary Which Algorithm Should I Use? A Note on Asynchronous Methods Further Reading 184 184 185 185 186 7. Learning All Possible Policies with Entropy Methods.191 What Is Entropy? Maximum Entropy Reinforcement Learning Soft Actor-Critic SAC Implementation Details and Discrete Action Spaces Automatically Adjusting Temperature Case Study: Automated Traffic Management to Reduce Queuing Extensions to Maximum Entropy Methods Other Measures of Entropy (and Ensembles) Optimistic Exploration Using the Upper Bound of Double Q-Learning Tinkering with Experience Replay Soft Policy Gradient Soft Q-Learning (and Derivatives) Path Consistency Learning Performance Comparison: SAC Versus PPO How Does Entropy Encourage Exploration? How Does the Temperature Parameter Alter Exploration? Industrial Example: Learning to Drive with a Remote Control Car Description of the Problem Minimizing Training Time Dramatic Actions Hyperparameter Search Final Policy Further Improvements Summary Equivalence Between Policy Gradients and Soft Q-Learning What Does This Mean For the Future? What Does This Mean Now? 191 192 193 194 194 195 196 196 196 197 197 197 198 198 200 203 205 205 205 208 209 209 210 211 211 212 212 8. Improving How an Agent Learns. 215 Rethinking the MDP Partially Observable Markov Decision Process Case Study: Using POMDPs in Autonomous Vehicles Contextual Markov Decision Processes MDPs with Changing Actions 216 216 218 219 219 Table
of Contents | xl
Regularized MDPs Hierarchical Reinforcement Learning Naive HRL High-Low Hierarchies with Intrinsic Rewards (HIRO) Learning Skills and Unsupervised RL Using Skills in HRL HRL Conclusions Multi-Agent Reinforcement Learning MARL Frameworks Centralized or Decentralized Single-Agent Algorithms Case Study: Using Single-Agent Decentralized Learning in UAVs Centralized Learning, Decentralized Execution Decentralized Learning Other Combinations Challenges of MARL MARL Conclusions Expert Guidance Behavior Cloning Imitation RL Inverse RL Curriculum Learning Other Paradigms Meta-Learning Transfer Learning Summary Further Reading 220 220 221 222 223 224 225 225 226 228 229 230 231 232 233 234 235 235 236 236 237 238 240 240 240 241 242 9. Practical Reinforcement Learning. 249 The RL Project Life Cycle Life Cycle Definition Problem Definition: What Is an RL Project? RL Problems Are Sequential RL Problems Are Strategic Low-Level RL Indicators Types of Learning RL Engineering and Refinement Process Environment Engineering State Engineering or State Representation Learning Policy Engineering xii I Table of Contents 249 251 254 254 255 256 258 262 262 263 266 268
Mapping Policies to Action Spaces Exploration Reward Engineering Summary Further Reading 273 277 283 287 288 10. Operational Reinforcement Learning. 295 Implementation Frameworks Scaling RL Evaluation Deployment Goals Architecture Ancillary Tooling Safety, Security, and Ethics Summary Further Reading 296 296 299 307 315 315 319 321 326 331 332 11. Conclusions and the Future. 339 Tips and Tricks Framing the Problem Your Data Training Evaluation Deployment Debugging ${ALGORITHM_NAME} Can’t Solve ${ENVIRONMENT}! Monitoring for Debugging The Future of Reinforcement Learning RL Market Opportunities Future RL and Research Directions Concluding Remarks Next Steps Now Its Your Turn Further Reading 339 339 340 341 342 343 343 345 346 346 347 348 353 354 354 355 Table of Contents | xiii
A. The Gradient of a Logistic Policy for Two Actions. 357 B. The Gradient of a Softmax Policy.361 Glossary. 363 Acronyms and Common Terms Symbols and Notation 363 366 Index. 369 xiv I Table of Contents |
adam_txt |
Table of Contents Preface. xv 1. Why Reinforcement Learning?. Why Now? Machine Learning Reinforcement Learning When Should You Use RL? RL Applications Taxonomy of RL Approaches Model-Free or Model-Based How Agents Use and Update Their Strategy Discrete or Continuous Actions Optimization Methods Policy Evaluation and Improvement Fundamental Concepts in Reinforcement Learning The First RL Algorithm Is RL the Same as ML? Reward and Feedback Reinforcement Learning as a Discipline Summary Further Reading 1 2 3 4 5 7 8 8 9 10 11 11 12 12 15 16 18 20 20 2. Markov Decision Processes, Dynamic Programming, and Monte Carlo Methods. 25 Multi-Arm Bandit Testing Reward Engineering Policy Evaluation: The Value Function Policy Improvement: Choosing the Best Action 25 26 26 29 vii
Simulating the Environment Running the Experiment Improving the e-greedy Algorithm Markov Decision Processes Inventory Control Inventory Control Simulation Policies and Value Functions Discounted Rewards Predicting Rewards with the State-Value Function Predicting Rewards with the Action-Value Function Optimal Policies Monte Carlo Policy Generation Value Iteration with Dynamic Programming Implementing Value Iteration Results of Value Iteration Summary Further Reading 31 31 33 35 36 40 42 42 43 47 48 50 52 54 56 57 57 3. Temporal-Difference Learning, Q-Learning, and n-Step Algorithms. 59 Formulation of Temporal-Difference Learning Q-Learning SARSA Q-Learning Versus SARSA Case Study: Automatically Scaling Application Containers to Reduce Cost Industrial Example: Real-Time Bidding in Advertising Defining the MDP Results of the Real-Time Bidding Environments Further Improvements Extensions to Q-Learning Double Q-Learning Delayed Q-Learning Comparing Standard, Double, and Delayed Q-learning Opposition Learning n-Step Algorithms n-Step Algorithms on Grid Environments Eligibility Traces Extensions to Eligibility Traces Watkins’s Q(A) Fuzzy Wipes in Watkins’s Q(A) Speedy Q-Learning Accumulating Versus Replacing Eligibility Traces viii Į Table of Contents 60 62 64 65 68 70 70 71 73 74 74 74 75 75 76 79 80 83 83 84 84 84
85 85 Summary Further Reading 4. Deep Q-Networks. . 87 88 88 89 90 91 92 92 92 93 93 94 98 99 100 102 102 102 103 104 106 107 108 109 109 111 112 Deep Learning Architectures Fundamentals Common Neural Network Architectures Deep Learning Frameworks Deep Reinforcement Learning Deep Q-Learning Experience Replay Q-Network Clones Neural Network Architecture Implementing DQN Example: DQN on the CartPole Environment Case Study: Reducing Energy Usage in Buildings Rainbow DQN Distributional RL Prioritized Experience Replay Noisy Nets Dueling Networks Example: Rainbow DQN on Atari Games Results Discussion Other DQN Improvements Improving Exploration Improving Rewards Learning from Offline Data Summary Further Reading 5. Policy Gradient Methods. . Benefits of Learning a Policy Directly How to Calculate the Gradient of a Policy Policy Gradient Theorem Policy Functions Linear Policies Arbitrary Policies Basic Implementations Monte Carlo (REINFORCE) REINFORCE with Baseline 115 115 116 117 119 120 122 122 122 124 Table of Contents | ix
Gradient Variance Reduction n-Step Actor-Critic and Advantage Actor-Critic (A2C) Eligibility Traces Actor-Critic A Comparison of Basic Policy Gradient Algorithms Industrial Example: Automatically Purchasing Products for Customers The Environment: Gym-Shopping-Cart Expectations Results from the Shopping Cart Environment Summary Further Reading 127 129 134 135 136 137 137 138 142 143 6. Beyond Policy Gradients. 145 Off-Policy Algorithms Importance Sampling Behavior and Target Policies Off-Policy Q-Learning Gradient Temporal-Difference Learning Greedy-GQ Off-Policy Actor-Critics Deterministic Policy Gradients Deterministic Policy Gradients Deep Deterministic Policy Gradients Twin Delayed DDPG Case Study: Recommendations Using Reviews Improvements to DPG Trust Region Methods Kullback-Leibler Divergence Natural Policy Gradients and Trust Region Policy Optimization Proximal Policy Optimization Example: Using Servos for a Real-Life Reacher Experiment Setup RL Algorithm Implementation Increasing the Complexity of the Algorithm Hyperparameter Tuning in a Simulation Resulting Policies Other Policy Gradient Algorithms Retrace(A) Actor-Critic with Experience Replay (ACER) Actor-Critic Using Kronecker-Factored Trust Regions (ACKTR) Emphatic Methods Extensions to Policy Gradient Algorithms x I Table of Contents 145 146 148 149 149 150 151 152 152 154 158 161 163 163 165 167 169 174 175 175 177 178 180 181 182 182 183 183 184
Quantile Regression in Policy Gradient Algorithms Summary Which Algorithm Should I Use? A Note on Asynchronous Methods Further Reading 184 184 185 185 186 7. Learning All Possible Policies with Entropy Methods.191 What Is Entropy? Maximum Entropy Reinforcement Learning Soft Actor-Critic SAC Implementation Details and Discrete Action Spaces Automatically Adjusting Temperature Case Study: Automated Traffic Management to Reduce Queuing Extensions to Maximum Entropy Methods Other Measures of Entropy (and Ensembles) Optimistic Exploration Using the Upper Bound of Double Q-Learning Tinkering with Experience Replay Soft Policy Gradient Soft Q-Learning (and Derivatives) Path Consistency Learning Performance Comparison: SAC Versus PPO How Does Entropy Encourage Exploration? How Does the Temperature Parameter Alter Exploration? Industrial Example: Learning to Drive with a Remote Control Car Description of the Problem Minimizing Training Time Dramatic Actions Hyperparameter Search Final Policy Further Improvements Summary Equivalence Between Policy Gradients and Soft Q-Learning What Does This Mean For the Future? What Does This Mean Now? 191 192 193 194 194 195 196 196 196 197 197 197 198 198 200 203 205 205 205 208 209 209 210 211 211 212 212 8. Improving How an Agent Learns. 215 Rethinking the MDP Partially Observable Markov Decision Process Case Study: Using POMDPs in Autonomous Vehicles Contextual Markov Decision Processes MDPs with Changing Actions 216 216 218 219 219 Table
of Contents | xl
Regularized MDPs Hierarchical Reinforcement Learning Naive HRL High-Low Hierarchies with Intrinsic Rewards (HIRO) Learning Skills and Unsupervised RL Using Skills in HRL HRL Conclusions Multi-Agent Reinforcement Learning MARL Frameworks Centralized or Decentralized Single-Agent Algorithms Case Study: Using Single-Agent Decentralized Learning in UAVs Centralized Learning, Decentralized Execution Decentralized Learning Other Combinations Challenges of MARL MARL Conclusions Expert Guidance Behavior Cloning Imitation RL Inverse RL Curriculum Learning Other Paradigms Meta-Learning Transfer Learning Summary Further Reading 220 220 221 222 223 224 225 225 226 228 229 230 231 232 233 234 235 235 236 236 237 238 240 240 240 241 242 9. Practical Reinforcement Learning. 249 The RL Project Life Cycle Life Cycle Definition Problem Definition: What Is an RL Project? RL Problems Are Sequential RL Problems Are Strategic Low-Level RL Indicators Types of Learning RL Engineering and Refinement Process Environment Engineering State Engineering or State Representation Learning Policy Engineering xii I Table of Contents 249 251 254 254 255 256 258 262 262 263 266 268
Mapping Policies to Action Spaces Exploration Reward Engineering Summary Further Reading 273 277 283 287 288 10. Operational Reinforcement Learning. 295 Implementation Frameworks Scaling RL Evaluation Deployment Goals Architecture Ancillary Tooling Safety, Security, and Ethics Summary Further Reading 296 296 299 307 315 315 319 321 326 331 332 11. Conclusions and the Future. 339 Tips and Tricks Framing the Problem Your Data Training Evaluation Deployment Debugging ${ALGORITHM_NAME} Can’t Solve ${ENVIRONMENT}! Monitoring for Debugging The Future of Reinforcement Learning RL Market Opportunities Future RL and Research Directions Concluding Remarks Next Steps Now Its Your Turn Further Reading 339 339 340 341 342 343 343 345 346 346 347 348 353 354 354 355 Table of Contents | xiii
A. The Gradient of a Logistic Policy for Two Actions. 357 B. The Gradient of a Softmax Policy.361 Glossary. 363 Acronyms and Common Terms Symbols and Notation 363 366 Index. 369 xiv I Table of Contents |
any_adam_object | 1 |
any_adam_object_boolean | 1 |
author | Winder, Phil ca. 20./21. Jh |
author_GND | (DE-588)1230881646 |
author_facet | Winder, Phil ca. 20./21. Jh |
author_role | aut |
author_sort | Winder, Phil ca. 20./21. Jh |
author_variant | p w pw |
building | Verbundindex |
bvnumber | BV047160991 |
classification_rvk | ST 300 QH 500 |
ctrlnum | (OCoLC)1245340739 (DE-599)BVBBV047160991 |
discipline | Informatik Wirtschaftswissenschaften |
discipline_str_mv | Wirtschaftswissenschaften |
edition | First edition |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 c 4500</leader><controlfield tag="001">BV047160991</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20240807</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">210224s2020 a||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781098114831</subfield><subfield code="9">978-1-0981-1483-1</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1245340739</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV047160991</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-384</subfield><subfield code="a">DE-703</subfield><subfield code="a">DE-898</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 300</subfield><subfield code="0">(DE-625)143650:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">QH 500</subfield><subfield code="0">(DE-625)141607:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Winder, Phil</subfield><subfield code="d">ca. 20./21. Jh.</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1230881646</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Reinforcement learning</subfield><subfield code="b">industrial applications of intelligent agents</subfield><subfield code="c">Phil Winder, Ph.D.</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">First edition</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Beijing ; Boston ; Farnham ; Sebastopol ; Tokyo</subfield><subfield code="b">O'Reilly</subfield><subfield code="c">2020</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">© 2021</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xxiii, 379 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Bestärkendes Lernen</subfield><subfield code="g">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4825546-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Bestärkendes Lernen</subfield><subfield code="g">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4825546-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">978-1-4920-7236-2</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Augsburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032566602&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-032566602</subfield></datafield></record></collection> |
id | DE-604.BV047160991 |
illustrated | Illustrated |
index_date | 2024-07-03T16:40:42Z |
indexdate | 2024-08-08T00:08:53Z |
institution | BVB |
isbn | 9781098114831 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-032566602 |
oclc_num | 1245340739 |
open_access_boolean | |
owner | DE-384 DE-703 DE-898 DE-BY-UBR |
owner_facet | DE-384 DE-703 DE-898 DE-BY-UBR |
physical | xxiii, 379 Seiten Illustrationen, Diagramme |
publishDate | 2020 |
publishDateSearch | 2020 |
publishDateSort | 2020 |
publisher | O'Reilly |
record_format | marc |
spelling | Winder, Phil ca. 20./21. Jh. Verfasser (DE-588)1230881646 aut Reinforcement learning industrial applications of intelligent agents Phil Winder, Ph.D. First edition Beijing ; Boston ; Farnham ; Sebastopol ; Tokyo O'Reilly 2020 © 2021 xxiii, 379 Seiten Illustrationen, Diagramme txt rdacontent n rdamedia nc rdacarrier Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 gnd rswk-swf Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 s DE-604 Erscheint auch als Online-Ausgabe 978-1-4920-7236-2 Digitalisierung UB Augsburg - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032566602&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Winder, Phil ca. 20./21. Jh Reinforcement learning industrial applications of intelligent agents Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 gnd |
subject_GND | (DE-588)4825546-4 |
title | Reinforcement learning industrial applications of intelligent agents |
title_auth | Reinforcement learning industrial applications of intelligent agents |
title_exact_search | Reinforcement learning industrial applications of intelligent agents |
title_exact_search_txtP | Reinforcement learning industrial applications of intelligent agents |
title_full | Reinforcement learning industrial applications of intelligent agents Phil Winder, Ph.D. |
title_fullStr | Reinforcement learning industrial applications of intelligent agents Phil Winder, Ph.D. |
title_full_unstemmed | Reinforcement learning industrial applications of intelligent agents Phil Winder, Ph.D. |
title_short | Reinforcement learning |
title_sort | reinforcement learning industrial applications of intelligent agents |
title_sub | industrial applications of intelligent agents |
topic | Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 gnd |
topic_facet | Bestärkendes Lernen Künstliche Intelligenz |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032566602&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT winderphil reinforcementlearningindustrialapplicationsofintelligentagents |