Verfügbarkeit: Reinforcement learning

Reinforcement learning: industrial applications of intelligent agents

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Winder, Phil ca. 20./21. Jh (VerfasserIn)
Format:	Buch
Sprache:	English
Veröffentlicht:	Beijing ; Boston ; Farnham ; Sebastopol ; Tokyo O'Reilly 2020
Ausgabe:	First edition
Schlagworte:	Bestärkendes Lernen > Künstliche Intelligenz
Online-Zugang:	Inhaltsverzeichnis
Beschreibung:	xxiii, 379 Seiten Illustrationen, Diagramme
ISBN:	9781098114831

Internformat

MARC


LEADER	00000nam a2200000 c 4500
001	BV047160991
003	DE-604
005	20240807
007	t
008	210224s2020 a\|\|\| \|\|\|\| 00\|\|\| eng d
020			\|a 9781098114831 \|9 978-1-0981-1483-1
035			\|a (OCoLC)1245340739
035			\|a (DE-599)BVBBV047160991
040			\|a DE-604 \|b ger \|e rda
041	0		\|a eng
049			\|a DE-384 \|a DE-703 \|a DE-898
084			\|a ST 300 \|0 (DE-625)143650: \|2 rvk
084			\|a QH 500 \|0 (DE-625)141607: \|2 rvk
100	1		\|a Winder, Phil \|d ca. 20./21. Jh. \|e Verfasser \|0 (DE-588)1230881646 \|4 aut
245	1	0	\|a Reinforcement learning \|b industrial applications of intelligent agents \|c Phil Winder, Ph.D.
250			\|a First edition
264		1	\|a Beijing ; Boston ; Farnham ; Sebastopol ; Tokyo \|b O'Reilly \|c 2020
264		4	\|c © 2021
300			\|a xxiii, 379 Seiten \|b Illustrationen, Diagramme
336			\|b txt \|2 rdacontent
337			\|b n \|2 rdamedia
338			\|b nc \|2 rdacarrier
650	0	7	\|a Bestärkendes Lernen \|g Künstliche Intelligenz \|0 (DE-588)4825546-4 \|2 gnd \|9 rswk-swf
689	0	0	\|a Bestärkendes Lernen \|g Künstliche Intelligenz \|0 (DE-588)4825546-4 \|D s
689	0		\|5 DE-604
776	0	8	\|i Erscheint auch als \|n Online-Ausgabe \|z 978-1-4920-7236-2
856	4	2	\|m Digitalisierung UB Augsburg - ADAM Catalogue Enrichment \|q application/pdf \|u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032566602&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA \|3 Inhaltsverzeichnis
943	1		\|a oai:aleph.bib-bvb.de:BVB01-032566602

Datensatz im Suchindex

_version_	1806775860802355200
adam_text	Table of Contents Preface. xv 1. Why Reinforcement Learning?. Why Now? Machine Learning Reinforcement Learning When Should You Use RL? RL Applications Taxonomy of RL Approaches Model-Free or Model-Based How Agents Use and Update Their Strategy Discrete or Continuous Actions Optimization Methods Policy Evaluation and Improvement Fundamental Concepts in Reinforcement Learning The First RL Algorithm Is RL the Same as ML? Reward and Feedback Reinforcement Learning as a Discipline Summary Further Reading 1 2 3 4 5 7 8 8 9 10 11 11 12 12 15 16 18 20 20 2. Markov Decision Processes, Dynamic Programming, and Monte Carlo Methods. 25 Multi-Arm Bandit Testing Reward Engineering Policy Evaluation: The Value Function Policy Improvement: Choosing the Best Action 25 26 26 29 vii Simulating the Environment Running the Experiment Improving the e-greedy Algorithm Markov Decision Processes Inventory Control Inventory Control Simulation Policies and Value Functions Discounted Rewards Predicting Rewards with the State-Value Function Predicting Rewards with the Action-Value Function Optimal Policies Monte Carlo Policy Generation Value Iteration with Dynamic Programming Implementing Value Iteration Results of Value Iteration Summary Further Reading 31 31 33 35 36 40 42 42 43 47 48 50 52 54 56 57 57 3. Temporal-Difference Learning, Q-Learning, and n-Step Algorithms. 59 Formulation of Temporal-Difference Learning Q-Learning SARSA Q-Learning Versus SARSA Case Study: Automatically Scaling Application Containers to Reduce Cost Industrial Example: Real-Time Bidding in Advertising Defining the MDP Results of the Real-Time Bidding Environments Further Improvements Extensions to Q-Learning Double Q-Learning Delayed Q-Learning Comparing Standard, Double, and Delayed Q-learning Opposition Learning n-Step Algorithms n-Step Algorithms on Grid Environments Eligibility Traces Extensions to Eligibility Traces Watkins’s Q(A) Fuzzy Wipes in Watkins’s Q(A) Speedy Q-Learning Accumulating Versus Replacing Eligibility Traces viii Į Table of Contents 60 62 64 65 68 70 70 71 73 74 74 74 75 75 76 79 80 83 83 84 84 84 85 85 Summary Further Reading 4. Deep Q-Networks. . 87 88 88 89 90 91 92 92 92 93 93 94 98 99 100 102 102 102 103 104 106 107 108 109 109 111 112 Deep Learning Architectures Fundamentals Common Neural Network Architectures Deep Learning Frameworks Deep Reinforcement Learning Deep Q-Learning Experience Replay Q-Network Clones Neural Network Architecture Implementing DQN Example: DQN on the CartPole Environment Case Study: Reducing Energy Usage in Buildings Rainbow DQN Distributional RL Prioritized Experience Replay Noisy Nets Dueling Networks Example: Rainbow DQN on Atari Games Results Discussion Other DQN Improvements Improving Exploration Improving Rewards Learning from Offline Data Summary Further Reading 5. Policy Gradient Methods. . Benefits of Learning a Policy Directly How to Calculate the Gradient of a Policy Policy Gradient Theorem Policy Functions Linear Policies Arbitrary Policies Basic Implementations Monte Carlo (REINFORCE) REINFORCE with Baseline 115 115 116 117 119 120 122 122 122 124 Table of Contents \| ix Gradient Variance Reduction n-Step Actor-Critic and Advantage Actor-Critic (A2C) Eligibility Traces Actor-Critic A Comparison of Basic Policy Gradient Algorithms Industrial Example: Automatically Purchasing Products for Customers The Environment: Gym-Shopping-Cart Expectations Results from the Shopping Cart Environment Summary Further Reading 127 129 134 135 136 137 137 138 142 143 6. Beyond Policy Gradients. 145 Off-Policy Algorithms Importance Sampling Behavior and Target Policies Off-Policy Q-Learning Gradient Temporal-Difference Learning Greedy-GQ Off-Policy Actor-Critics Deterministic Policy Gradients Deterministic Policy Gradients Deep Deterministic Policy Gradients Twin Delayed DDPG Case Study: Recommendations Using Reviews Improvements to DPG Trust Region Methods Kullback-Leibler Divergence Natural Policy Gradients and Trust Region Policy Optimization Proximal Policy Optimization Example: Using Servos for a Real-Life Reacher Experiment Setup RL Algorithm Implementation Increasing the Complexity of the Algorithm Hyperparameter Tuning in a Simulation Resulting Policies Other Policy Gradient Algorithms Retrace(A) Actor-Critic with Experience Replay (ACER) Actor-Critic Using Kronecker-Factored Trust Regions (ACKTR) Emphatic Methods Extensions to Policy Gradient Algorithms x I Table of Contents 145 146 148 149 149 150 151 152 152 154 158 161 163 163 165 167 169 174 175 175 177 178 180 181 182 182 183 183 184 Quantile Regression in Policy Gradient Algorithms Summary Which Algorithm Should I Use? A Note on Asynchronous Methods Further Reading 184 184 185 185 186 7. Learning All Possible Policies with Entropy Methods.191 What Is Entropy? Maximum Entropy Reinforcement Learning Soft Actor-Critic SAC Implementation Details and Discrete Action Spaces Automatically Adjusting Temperature Case Study: Automated Traffic Management to Reduce Queuing Extensions to Maximum Entropy Methods Other Measures of Entropy (and Ensembles) Optimistic Exploration Using the Upper Bound of Double Q-Learning Tinkering with Experience Replay Soft Policy Gradient Soft Q-Learning (and Derivatives) Path Consistency Learning Performance Comparison: SAC Versus PPO How Does Entropy Encourage Exploration? How Does the Temperature Parameter Alter Exploration? Industrial Example: Learning to Drive with a Remote Control Car Description of the Problem Minimizing Training Time Dramatic Actions Hyperparameter Search Final Policy Further Improvements Summary Equivalence Between Policy Gradients and Soft Q-Learning What Does This Mean For the Future? What Does This Mean Now? 191 192 193 194 194 195 196 196 196 197 197 197 198 198 200 203 205 205 205 208 209 209 210 211 211 212 212 8. Improving How an Agent Learns. 215 Rethinking the MDP Partially Observable Markov Decision Process Case Study: Using POMDPs in Autonomous Vehicles Contextual Markov Decision Processes MDPs with Changing Actions 216 216 218 219 219 Table of Contents \| xl Regularized MDPs Hierarchical Reinforcement Learning Naive HRL High-Low Hierarchies with Intrinsic Rewards (HIRO) Learning Skills and Unsupervised RL Using Skills in HRL HRL Conclusions Multi-Agent Reinforcement Learning MARL Frameworks Centralized or Decentralized Single-Agent Algorithms Case Study: Using Single-Agent Decentralized Learning in UAVs Centralized Learning, Decentralized Execution Decentralized Learning Other Combinations Challenges of MARL MARL Conclusions Expert Guidance Behavior Cloning Imitation RL Inverse RL Curriculum Learning Other Paradigms Meta-Learning Transfer Learning Summary Further Reading 220 220 221 222 223 224 225 225 226 228 229 230 231 232 233 234 235 235 236 236 237 238 240 240 240 241 242 9. Practical Reinforcement Learning. 249 The RL Project Life Cycle Life Cycle Definition Problem Definition: What Is an RL Project? RL Problems Are Sequential RL Problems Are Strategic Low-Level RL Indicators Types of Learning RL Engineering and Refinement Process Environment Engineering State Engineering or State Representation Learning Policy Engineering xii I Table of Contents 249 251 254 254 255 256 258 262 262 263 266 268 Mapping Policies to Action Spaces Exploration Reward Engineering Summary Further Reading 273 277 283 287 288 10. Operational Reinforcement Learning. 295 Implementation Frameworks Scaling RL Evaluation Deployment Goals Architecture Ancillary Tooling Safety, Security, and Ethics Summary Further Reading 296 296 299 307 315 315 319 321 326 331 332 11. Conclusions and the Future. 339 Tips and Tricks Framing the Problem Your Data Training Evaluation Deployment Debugging ${ALGORITHM_NAME} Can’t Solve ${ENVIRONMENT}! Monitoring for Debugging The Future of Reinforcement Learning RL Market Opportunities Future RL and Research Directions Concluding Remarks Next Steps Now Its Your Turn Further Reading 339 339 340 341 342 343 343 345 346 346 347 348 353 354 354 355 Table of Contents \| xiii A. The Gradient of a Logistic Policy for Two Actions. 357 B. The Gradient of a Softmax Policy.361 Glossary. 363 Acronyms and Common Terms Symbols and Notation 363 366 Index. 369 xiv I Table of Contents
adam_txt	Table of Contents Preface. xv 1. Why Reinforcement Learning?. Why Now? Machine Learning Reinforcement Learning When Should You Use RL? RL Applications Taxonomy of RL Approaches Model-Free or Model-Based How Agents Use and Update Their Strategy Discrete or Continuous Actions Optimization Methods Policy Evaluation and Improvement Fundamental Concepts in Reinforcement Learning The First RL Algorithm Is RL the Same as ML? Reward and Feedback Reinforcement Learning as a Discipline Summary Further Reading 1 2 3 4 5 7 8 8 9 10 11 11 12 12 15 16 18 20 20 2. Markov Decision Processes, Dynamic Programming, and Monte Carlo Methods. 25 Multi-Arm Bandit Testing Reward Engineering Policy Evaluation: The Value Function Policy Improvement: Choosing the Best Action 25 26 26 29 vii Simulating the Environment Running the Experiment Improving the e-greedy Algorithm Markov Decision Processes Inventory Control Inventory Control Simulation Policies and Value Functions Discounted Rewards Predicting Rewards with the State-Value Function Predicting Rewards with the Action-Value Function Optimal Policies Monte Carlo Policy Generation Value Iteration with Dynamic Programming Implementing Value Iteration Results of Value Iteration Summary Further Reading 31 31 33 35 36 40 42 42 43 47 48 50 52 54 56 57 57 3. Temporal-Difference Learning, Q-Learning, and n-Step Algorithms. 59 Formulation of Temporal-Difference Learning Q-Learning SARSA Q-Learning Versus SARSA Case Study: Automatically Scaling Application Containers to Reduce Cost Industrial Example: Real-Time Bidding in Advertising Defining the MDP Results of the Real-Time Bidding Environments Further Improvements Extensions to Q-Learning Double Q-Learning Delayed Q-Learning Comparing Standard, Double, and Delayed Q-learning Opposition Learning n-Step Algorithms n-Step Algorithms on Grid Environments Eligibility Traces Extensions to Eligibility Traces Watkins’s Q(A) Fuzzy Wipes in Watkins’s Q(A) Speedy Q-Learning Accumulating Versus Replacing Eligibility Traces viii Į Table of Contents 60 62 64 65 68 70 70 71 73 74 74 74 75 75 76 79 80 83 83 84 84 84 85 85 Summary Further Reading 4. Deep Q-Networks. . 87 88 88 89 90 91 92 92 92 93 93 94 98 99 100 102 102 102 103 104 106 107 108 109 109 111 112 Deep Learning Architectures Fundamentals Common Neural Network Architectures Deep Learning Frameworks Deep Reinforcement Learning Deep Q-Learning Experience Replay Q-Network Clones Neural Network Architecture Implementing DQN Example: DQN on the CartPole Environment Case Study: Reducing Energy Usage in Buildings Rainbow DQN Distributional RL Prioritized Experience Replay Noisy Nets Dueling Networks Example: Rainbow DQN on Atari Games Results Discussion Other DQN Improvements Improving Exploration Improving Rewards Learning from Offline Data Summary Further Reading 5. Policy Gradient Methods. . Benefits of Learning a Policy Directly How to Calculate the Gradient of a Policy Policy Gradient Theorem Policy Functions Linear Policies Arbitrary Policies Basic Implementations Monte Carlo (REINFORCE) REINFORCE with Baseline 115 115 116 117 119 120 122 122 122 124 Table of Contents \| ix Gradient Variance Reduction n-Step Actor-Critic and Advantage Actor-Critic (A2C) Eligibility Traces Actor-Critic A Comparison of Basic Policy Gradient Algorithms Industrial Example: Automatically Purchasing Products for Customers The Environment: Gym-Shopping-Cart Expectations Results from the Shopping Cart Environment Summary Further Reading 127 129 134 135 136 137 137 138 142 143 6. Beyond Policy Gradients. 145 Off-Policy Algorithms Importance Sampling Behavior and Target Policies Off-Policy Q-Learning Gradient Temporal-Difference Learning Greedy-GQ Off-Policy Actor-Critics Deterministic Policy Gradients Deterministic Policy Gradients Deep Deterministic Policy Gradients Twin Delayed DDPG Case Study: Recommendations Using Reviews Improvements to DPG Trust Region Methods Kullback-Leibler Divergence Natural Policy Gradients and Trust Region Policy Optimization Proximal Policy Optimization Example: Using Servos for a Real-Life Reacher Experiment Setup RL Algorithm Implementation Increasing the Complexity of the Algorithm Hyperparameter Tuning in a Simulation Resulting Policies Other Policy Gradient Algorithms Retrace(A) Actor-Critic with Experience Replay (ACER) Actor-Critic Using Kronecker-Factored Trust Regions (ACKTR) Emphatic Methods Extensions to Policy Gradient Algorithms x I Table of Contents 145 146 148 149 149 150 151 152 152 154 158 161 163 163 165 167 169 174 175 175 177 178 180 181 182 182 183 183 184 Quantile Regression in Policy Gradient Algorithms Summary Which Algorithm Should I Use? A Note on Asynchronous Methods Further Reading 184 184 185 185 186 7. Learning All Possible Policies with Entropy Methods.191 What Is Entropy? Maximum Entropy Reinforcement Learning Soft Actor-Critic SAC Implementation Details and Discrete Action Spaces Automatically Adjusting Temperature Case Study: Automated Traffic Management to Reduce Queuing Extensions to Maximum Entropy Methods Other Measures of Entropy (and Ensembles) Optimistic Exploration Using the Upper Bound of Double Q-Learning Tinkering with Experience Replay Soft Policy Gradient Soft Q-Learning (and Derivatives) Path Consistency Learning Performance Comparison: SAC Versus PPO How Does Entropy Encourage Exploration? How Does the Temperature Parameter Alter Exploration? Industrial Example: Learning to Drive with a Remote Control Car Description of the Problem Minimizing Training Time Dramatic Actions Hyperparameter Search Final Policy Further Improvements Summary Equivalence Between Policy Gradients and Soft Q-Learning What Does This Mean For the Future? What Does This Mean Now? 191 192 193 194 194 195 196 196 196 197 197 197 198 198 200 203 205 205 205 208 209 209 210 211 211 212 212 8. Improving How an Agent Learns. 215 Rethinking the MDP Partially Observable Markov Decision Process Case Study: Using POMDPs in Autonomous Vehicles Contextual Markov Decision Processes MDPs with Changing Actions 216 216 218 219 219 Table of Contents \| xl Regularized MDPs Hierarchical Reinforcement Learning Naive HRL High-Low Hierarchies with Intrinsic Rewards (HIRO) Learning Skills and Unsupervised RL Using Skills in HRL HRL Conclusions Multi-Agent Reinforcement Learning MARL Frameworks Centralized or Decentralized Single-Agent Algorithms Case Study: Using Single-Agent Decentralized Learning in UAVs Centralized Learning, Decentralized Execution Decentralized Learning Other Combinations Challenges of MARL MARL Conclusions Expert Guidance Behavior Cloning Imitation RL Inverse RL Curriculum Learning Other Paradigms Meta-Learning Transfer Learning Summary Further Reading 220 220 221 222 223 224 225 225 226 228 229 230 231 232 233 234 235 235 236 236 237 238 240 240 240 241 242 9. Practical Reinforcement Learning. 249 The RL Project Life Cycle Life Cycle Definition Problem Definition: What Is an RL Project? RL Problems Are Sequential RL Problems Are Strategic Low-Level RL Indicators Types of Learning RL Engineering and Refinement Process Environment Engineering State Engineering or State Representation Learning Policy Engineering xii I Table of Contents 249 251 254 254 255 256 258 262 262 263 266 268 Mapping Policies to Action Spaces Exploration Reward Engineering Summary Further Reading 273 277 283 287 288 10. Operational Reinforcement Learning. 295 Implementation Frameworks Scaling RL Evaluation Deployment Goals Architecture Ancillary Tooling Safety, Security, and Ethics Summary Further Reading 296 296 299 307 315 315 319 321 326 331 332 11. Conclusions and the Future. 339 Tips and Tricks Framing the Problem Your Data Training Evaluation Deployment Debugging ${ALGORITHM_NAME} Can’t Solve ${ENVIRONMENT}! Monitoring for Debugging The Future of Reinforcement Learning RL Market Opportunities Future RL and Research Directions Concluding Remarks Next Steps Now Its Your Turn Further Reading 339 339 340 341 342 343 343 345 346 346 347 348 353 354 354 355 Table of Contents \| xiii A. The Gradient of a Logistic Policy for Two Actions. 357 B. The Gradient of a Softmax Policy.361 Glossary. 363 Acronyms and Common Terms Symbols and Notation 363 366 Index. 369 xiv I Table of Contents
any_adam_object	1
any_adam_object_boolean	1
author	Winder, Phil ca. 20./21. Jh
author_GND	(DE-588)1230881646
author_facet	Winder, Phil ca. 20./21. Jh
author_role	aut
author_sort	Winder, Phil ca. 20./21. Jh
author_variant	p w pw
building	Verbundindex
bvnumber	BV047160991
classification_rvk	ST 300 QH 500
ctrlnum	(OCoLC)1245340739 (DE-599)BVBBV047160991
discipline	Informatik Wirtschaftswissenschaften
discipline_str_mv	Wirtschaftswissenschaften
edition	First edition
format	Book
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 c 4500</leader><controlfield tag="001">BV047160991</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20240807</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">210224s2020 a\|\|\| \|\|\|\| 00\|\|\| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781098114831</subfield><subfield code="9">978-1-0981-1483-1</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1245340739</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV047160991</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-384</subfield><subfield code="a">DE-703</subfield><subfield code="a">DE-898</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 300</subfield><subfield code="0">(DE-625)143650:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">QH 500</subfield><subfield code="0">(DE-625)141607:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Winder, Phil</subfield><subfield code="d">ca. 20./21. Jh.</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1230881646</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Reinforcement learning</subfield><subfield code="b">industrial applications of intelligent agents</subfield><subfield code="c">Phil Winder, Ph.D.</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">First edition</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Beijing ; Boston ; Farnham ; Sebastopol ; Tokyo</subfield><subfield code="b">O'Reilly</subfield><subfield code="c">2020</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">© 2021</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xxiii, 379 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Bestärkendes Lernen</subfield><subfield code="g">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4825546-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Bestärkendes Lernen</subfield><subfield code="g">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4825546-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">978-1-4920-7236-2</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Augsburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032566602&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-032566602</subfield></datafield></record></collection>
id	DE-604.BV047160991
illustrated	Illustrated
index_date	2024-07-03T16:40:42Z
indexdate	2024-08-08T00:08:53Z
institution	BVB
isbn	9781098114831
language	English
oai_aleph_id	oai:aleph.bib-bvb.de:BVB01-032566602
oclc_num	1245340739
open_access_boolean
owner	DE-384 DE-703 DE-898 DE-BY-UBR
owner_facet	DE-384 DE-703 DE-898 DE-BY-UBR
physical	xxiii, 379 Seiten Illustrationen, Diagramme
publishDate	2020
publishDateSearch	2020
publishDateSort	2020
publisher	O'Reilly
record_format	marc
spelling	Winder, Phil ca. 20./21. Jh. Verfasser (DE-588)1230881646 aut Reinforcement learning industrial applications of intelligent agents Phil Winder, Ph.D. First edition Beijing ; Boston ; Farnham ; Sebastopol ; Tokyo O'Reilly 2020 © 2021 xxiii, 379 Seiten Illustrationen, Diagramme txt rdacontent n rdamedia nc rdacarrier Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 gnd rswk-swf Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 s DE-604 Erscheint auch als Online-Ausgabe 978-1-4920-7236-2 Digitalisierung UB Augsburg - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032566602&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis
spellingShingle	Winder, Phil ca. 20./21. Jh Reinforcement learning industrial applications of intelligent agents Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 gnd
subject_GND	(DE-588)4825546-4
title	Reinforcement learning industrial applications of intelligent agents
title_auth	Reinforcement learning industrial applications of intelligent agents
title_exact_search	Reinforcement learning industrial applications of intelligent agents
title_exact_search_txtP	Reinforcement learning industrial applications of intelligent agents
title_full	Reinforcement learning industrial applications of intelligent agents Phil Winder, Ph.D.
title_fullStr	Reinforcement learning industrial applications of intelligent agents Phil Winder, Ph.D.
title_full_unstemmed	Reinforcement learning industrial applications of intelligent agents Phil Winder, Ph.D.
title_short	Reinforcement learning
title_sort	reinforcement learning industrial applications of intelligent agents
title_sub	industrial applications of intelligent agents
topic	Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 gnd
topic_facet	Bestärkendes Lernen Künstliche Intelligenz
url	http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032566602&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA
work_keys_str_mv	AT winderphil reinforcementlearningindustrialapplicationsofintelligentagents

Verfügbarkeit

Es ist kein Print-Exemplar vorhanden.

Fernleihe Bestellen Achtung: Nicht im THWS-Bestand! Inhaltsverzeichnis

MARC

Datensatz im Suchindex

Es ist kein Print-Exemplar vorhanden.

Ähnliche Einträge