Reinforcement learning: an introduction
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Cambridge, Massachusetts ; London, England
The MIT Press
[2018]
|
Ausgabe: | Second edition |
Schriftenreihe: | Adaptive computation and machine learning
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | Hier auch später erschienene, unveränderte Nachdrucke |
Beschreibung: | xxii, 526 Seiten Diagramme |
ISBN: | 9780262039246 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV045195963 | ||
003 | DE-604 | ||
005 | 20231201 | ||
007 | t | ||
008 | 180917s2018 xxu|||| |||| 00||| eng d | ||
010 | |a 018023826 | ||
020 | |a 9780262039246 |c hardback |9 978-0-262-03924-6 | ||
035 | |a (OCoLC)1050693967 | ||
035 | |a (DE-599)BVBBV045195963 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
044 | |a xxu |c US | ||
049 | |a DE-384 |a DE-91G |a DE-706 |a DE-739 |a DE-703 |a DE-863 |a DE-523 |a DE-20 |a DE-83 |a DE-945 |a DE-91 |a DE-1050 |a DE-188 |a DE-29T |a DE-M347 |a DE-473 |a DE-Aug4 |a DE-92 |a DE-898 |a DE-824 |a DE-1028 |a DE-860 |a DE-1043 |a DE-355 |a DE-634 |a DE-861 |a DE-859 |a DE-862 | ||
050 | 0 | |a Q325.6 | |
082 | 0 | |a 006.3/1 |2 23 | |
084 | |a ST 285 |0 (DE-625)143648: |2 rvk | ||
084 | |a QH 740 |0 (DE-625)141614: |2 rvk | ||
084 | |a ST 300 |0 (DE-625)143650: |2 rvk | ||
084 | |a ST 302 |0 (DE-625)143652: |2 rvk | ||
084 | |a DAT 708f |2 stub | ||
084 | |a 68T05 |2 msc | ||
100 | 1 | |a Sutton, Richard S. |0 (DE-588)1099442435 |4 aut | |
245 | 1 | 0 | |a Reinforcement learning |b an introduction |c Richard S. Sutton and Andrew G. Barto |
250 | |a Second edition | ||
264 | 1 | |a Cambridge, Massachusetts ; London, England |b The MIT Press |c [2018] | |
264 | 4 | |c © 2018 | |
300 | |a xxii, 526 Seiten |b Diagramme | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a Adaptive computation and machine learning | |
500 | |a Hier auch später erschienene, unveränderte Nachdrucke | ||
650 | 0 | 7 | |a Künstliche Intelligenz |0 (DE-588)4033447-8 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Künstliche Intelligenz |0 (DE-588)4033447-8 |D s |
689 | 0 | 1 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Barto, Andrew |0 (DE-588)1099442664 |4 aut | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |d 2020 |w (DE-604)BV047234781 |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030585045&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-030585045 |
Datensatz im Suchindex
DE-BY-862_location | 2000 |
---|---|
DE-BY-863_location | 1000 |
DE-BY-FWS_call_number | 2000/ST 302 S967(2) 1000/ST 302 S967 |
DE-BY-FWS_katkey | 707141 |
DE-BY-FWS_media_number | 083000508710 083101415866 083000508711 |
_version_ | 1816842961373102080 |
adam_text | Contents Preface to the Second Edition xiii Preface to the First Edition xvii Summary of Notation 1 1 Introduction 1.1 Reinforcement Learning .................................................................... 1.2 Examples .............................................................. 1.3 Elements of Reinforcement Learning......................................... 1.4 Limitations and Scope............................................... 1.5 An Extended Example: Tic-Tac-Toe ............................................ 1.6 Summary........................................................................................ 1.7 Early History of Reinforcement Learning............... Tabular Solution Methods 2 Multi-armed Bandits 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 xix 1 1 4 6 7 8 13 13 23 25 A fc-armed Bandit Problem ..................... . 25 Action-value Methods ................................................................... . 27 The 10-armed Testbed................................................................. 28 Incremental Implementation........... ... 30 Tracking a NonstationaryProblem .............................................................. 32 Optimistic Initial Values............... .. . .................. .. .................... .............. 34 Upper-Confidence-Bound Action Selection ... . . ... . . . . . . 35 Gradient Bandit Algorithms.................... ....... 37 Associative Search (Contextual Bandits)........................ 41 Summary . . ............................................. .. .................... .. . . . ............... 42
viii Contents 3 Finite Markov Decision Processes 47 3.1 The Agent-Environment Interface ............................................................... 47 3.2 Goals and Rewards......................................................................................... 53 3.3 Returns and Episodes................................................................................... 54 3.4 Unified Notation for Episodic and Continuing Tasks................................. 57 3.5 Policies and Value Functions ........................................................... 58 3.6 Optimal Policies and Optimal Value Functions .......................................... 62 3.7 Optimality and Approximation.......................... 67 3.8 Summary................................ 68 4 Dynamic Programming 4.1 Policy Evaluation (Prediction) ................................... 4.2 Policy Improvement ...................................................................................... 4.3 Policy Iteration.................................................. 4.4 Value Iteration........................................................... 4.5 Asynchronous Dynamic Programming....................... 4.6 Generalized Policy Iteration........................................................ 4.7 Efficiency of Dynamic Programming................. 4.8 Summary..................................................................................... 73 74 76 80 82 85 86 87 88 5 Monte Carlo Methods 91 5.1 Monte Carlo Prediction.......................... 92 5.2 Monte Carlo Estimation of Action Values
.......................................... 96 5.3 Monte Carlo Control............ .. ................. ... . . ............ ... 97 5.4 Monte Carlo Control without Exploring Starts ........................... 100 5.5 Off-policy Prediction via Importance Sampling............................................ 103 5.6 Incremental Implementation....................... . . ................................ 109 5.7 Off-policy Monte Carlo Control................................ 110 5.8 *Discounting-aware Importance Sampling.......................... 112 5.9 *Per-decision Importance Sampling................. 114 5.10 Summary........................................................ .................... .. . . . . . 115 6 Temporal-Difference Learning 119 6.1 TD Prediction ........................................................ 119 6.2 Advantages of TD Prediction Methods ........................... 124 6.3 Optimality of TD(0) ........................................................ . . . . ............... 126 6.4 Sarsa: On-policy TD Control................. 129 6.5 Q-learning: Off-policy TD Control .................. .. .................. .. . . . . . . 131 6.6 Expected Sarsa . . . . . . . ............... .. .............. .. . . . . .... ... . v 133 6.7 Maximization Bias and Double Learning........................................................ 134 6.8 Games, Afterstates, and Other Special Cases............................................... 136 6.9 Summary...................................................................................... 138
Contents ix 7 ո-step Bootstrapping 141 7.1 n-step TD Prediction............... .. . . .................. ...... . . . =. .142 7.2 n-step Sarsa . . ... ..... . . ....................................... .·.... . .’. 145 7.3 ո-step Off-policy Learning.................. . ........................... . .148 7.4 *Per-decision Methods with Control Variates............................................ 150 7.5 Off-policy Learning Without Importance Sampling: The ո-step Tree Backup Algorithm ........... . . ..... . . . . 152 7.6 *A Unifying Algorithm: ո-step Q(a)............ ........ v ...... : 154 7.7 Summary . .... . . . . . . ·.՛................. . . . . ... ... . . . . . . . 157 8 Planning and Learning with Tabular Methods 159 8.1 Models and Planning.................................................................... 159 8.2 Dyna: Integrated Planning, Acting, and Learning ........................ .............. 161 8.3 When the Model Is Wrong . .... . . ..... . . . ... . . . . . .... 166 8.4 Prioritized Sweeping .............................. 168 8.5 Expected vs. Sample Updates......................................... 172 8.6 Trajectory Sampling .................... . . . .174 8.7 Real-time Dynamic Programming............ .. . . . 177 8.8 Planning at Decision Time. ... ... . . . ... . ՛.՛. . . . . . . ՛՛. . . . . . 180 8.9 Heuristic Search ... . . . . . . . . . . . . . . . . . .՛. ! . . . . 181 8.10 Rollout Algorithms ........................... .. . . . . . .... . . ։ . . . 183 8.11 Monte Carlo Tree Search. . . . . . . . . . . . . ............... .. . !. . . . . . . 185 8.12
Summary of the Chapter................................... ... ... . . . . 188 8.13 Summary of Part I: Dimensions............ .. ................................... .. 189 II Approximate Solution Methods 195 9 On-policy Prediction with Approximation 197 9.1 Value-function Approximation . . . . . . ... ....... . . . . . . .198 9.2 The Prediction Objective (VE) ..................... .. . . .... .... . . . . . 199 9.3 Stochastic-gradient and Semi-gradient Methods ............... 200 9.4 Linear Methods............ ................. . . 204 9.5 Feature Construction for Linear Methods ..... . . . . .... . . . . . 210 9.5.1 Polynomials............................................... . 210 9.5.2 Fourier Basis................... ........................... . ... ..... . . . . 211 9.5.3 Coarse Coding ...... . . ..................... .... . . ... : ... 215 9.5.4 Tile Coding.............. .............. .. ................................. 217 9.5.5 Radial Basis Functions............ .. . . . . . . .... .՝. .... ! . . 221 9.6 Selecting Step-Size Parameters Manually........................................................222 9.7 Nonlinear Function Approximation: Artificial Neural Networks...............223 9.8 Least-Squares TD ........................................................ 228
Contents x 9.9 9.10 : 9.11 9.12 Memory-based Function Approximation............... ... . . . . . . .... 230 Kernel-based Function Approximation........................ . . . . . . 232 Looking Deeper at On-policy Learning: Interest and Emphasis . . . . . . 234 Summary . ...................... . ............................ ... . . . ......................... 236 10 On-policy Control with Approximation 243 10.1 Episodic Semi-gradient Control . . . .... ... . ;. . .՛·. .... ... . . 243 , 10.2 Semi-gradient ո-step Sarsa . . .......................... .. . ;.............247 10.3 Average Reward: A New Problem Setting for Continuing Tasks ................ 249 10.4 Deprecating the Discounted Setting.................................................................253 , 10.5 Differential Semi-gradient ո-step Sarsa . ..... ... ........... 255 i 10.6 Summary ........ ............................................ . . .................. 256 11 *Off-policy Methods with Approximation 257 11.1 Semi-gradient Methods.............................................. ................................. 258 11.2 Examples of Off-policy Divergence ........................................................ . . . 260 11.3 The Deadly Triad.................................................... ............... ... 264 11.4 Linear Value-function Geometry ..... ... .... . ... ........ . 266 11.5 Gradient Descent in the Bellman Error . . . . . . . . . ... . ................. 269 11.6 The Bellman Error is Not Learnable............................ 274 11.7 Gradient-TD
Methods................................................. ... ......................... ... . 278 11.8 Emphatic-TD Methods........................................... ............... ........................... 281 11.9 Reducing Variance................................................................................... ... . . 283 . 11.10 Summary . . ...................... ... 284 12 Eligibility Traces 287 12.1 The A-return................................................................ .................. . . . · · · 288 12.2 TD(A) . . . . . ... ... . ...................... ... . ! . . : . ................ ... . . . 292 12.3 ո-step Truncated λ-return Methods . . . : . ................... . . .... . 295 12.4 Redoing Updates: Online λ-return Algorithm . . ... . ... . . . . . . . 297 . . 12.5 True Online TD(A)............ .................. .................................................... ... . . 299 1?.6 *Dutch Traces in Monte Carlo Learning ......................... ........................... ... 301 .12.7 Sarsa(A) . .................... ľ . . . .... . . ........................... ... . . 303 12.8 Variable A and 7...................... . ......................... .................. ... . . . . . . . 307 12.9 Off-policy Traces with Control Variates......................... 309 12.10 Watkins’s Q(A) to Tree-Backup(A) ............................................... 312 12.11 Stable Off-policy Methods with Traces ............... ........................ ... 314 12.12 Implementation Issues............................................................................. ... 316
12.13 Conclusions............................................................................. ... 317
Contents xi 13 Policy Gradient Methods 321 13.1 Policy Approximation and its Advantages......................՛................... . . 322 13.2 The Policy Gradient Theorem . . ............................... .................................... 324 13.3 REINFORCE: Monte Carlo Policy Gradient ................................................ 326 13.4 REINFORCE with Baseline............................................................. 13.5 Actor-Critic Methods.................................. ... . .................................. ... 331 13.6 Policy Gradient for Continuing Problems ............... ... ... . . . . . . . . 333 13.7 Policy Parameterization for Continuous Actions.................................... 335 13.8 Summary....................................................................... -337 III Looking Deeper 339 14 Psychology 341 14.1 Prediction and Control.................................. ... ..................................... . 342 14.2 Classical Conditioning........................................... ................................. ... . . 343 14.2.1 Blocking and Higher-order Conditioning ............................ ... . 345 14.2.2 The Rescorla-Wagner Model..................................... ... . 346 14.2.3 TheTD Model ..................................... ... . . . ... ........ . 349 14.2.4 TD Model Simulations............................................................................ 350 14.3 Instrumental Conditioning.......................................................................... ... . 357 14.4 Delayed
Reinforcement......................................................................................... 361 14.5 Cognitive Maps ......................................................................................................363 14.6 Habitual and Goal-directed Behavior .................................................................364 14.7 Summary.............................................................................................................. 368 15 Neuroscience 377 15.1 Neuroscience Basics ............................................................................................378 15.2 Reward Signals, Reinforcement Signals, Values, and Prediction Errors . . 380 15.3 The Reward Prediction Error Hypothesis . .................................................... 381 15.4 Dopamine ...............................................................................................................383 15.5 Experimental Support for the Reward Prediction Error Hypothesis .... 387 15.6 TD Error/Dopamine Correspondence............... 390 15.7 Neural Actor-Critic ............................................................................................395 15.8 Actor and Critic Learning Rules ................................. 398 15.9 Hedonistic Neurons...............................................................................................402 15.10 Collective Reinforcement Learning................................................................... 404 15.11 Model-based Methods in the Brain................................................................... 407
15.12 Addiction............................................................................. ... . 409 15.13 Summary.......................................................................... 410 329
xii 16 Applications and Case Studies 16.1 TD-Gammon . . ................ .-16.2 Samuel’s Checkers Player . . . . 16.3 Watson’s Daily-Double Wagering 16.4 Optimizing Memory Control. . . 16.5 Human-level Video Game Play . 16.6 Mastering the Game of Go . . . . 16.6.1 AlphaGo . . ... . . . . 16.6.2 AlphaGo Zero................... 16.7 Personalized Web Services . . . . 16.8 Thermal Soaring......................... Contents 421 421 426 429 432 436 441 444 447 450 453 17 Frontiers 459 17.1 General ValueFunctions and Auxiliary Tasks................................... 459 17.2 Temporal Abstraction via Options................... . . . . . . . .... . . . . 461 17.3 Observations and State ................................................. 464 17.4 Designing Reward Signals . . . . . . . . ... . . . . . ................469 17.5 Remaining Issues ...................... . . . ... . . . . . . . . .............472 17.6 Experimental Support for the Reward Prediction Error Hypothesis .... 475 References 481 Index 519
|
any_adam_object | 1 |
author | Sutton, Richard S. Barto, Andrew |
author_GND | (DE-588)1099442435 (DE-588)1099442664 |
author_facet | Sutton, Richard S. Barto, Andrew |
author_role | aut aut |
author_sort | Sutton, Richard S. |
author_variant | r s s rs rss a b ab |
building | Verbundindex |
bvnumber | BV045195963 |
callnumber-first | Q - Science |
callnumber-label | Q325 |
callnumber-raw | Q325.6 |
callnumber-search | Q325.6 |
callnumber-sort | Q 3325.6 |
callnumber-subject | Q - General Science |
classification_rvk | ST 285 QH 740 ST 300 ST 302 |
classification_tum | DAT 708f |
ctrlnum | (OCoLC)1050693967 (DE-599)BVBBV045195963 |
dewey-full | 006.3/1 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 006 - Special computer methods |
dewey-raw | 006.3/1 |
dewey-search | 006.3/1 |
dewey-sort | 16.3 11 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik Wirtschaftswissenschaften |
edition | Second edition |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02239nam a2200517 c 4500</leader><controlfield tag="001">BV045195963</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20231201 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">180917s2018 xxu|||| |||| 00||| eng d</controlfield><datafield tag="010" ind1=" " ind2=" "><subfield code="a">018023826</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780262039246</subfield><subfield code="c">hardback</subfield><subfield code="9">978-0-262-03924-6</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1050693967</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV045195963</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxu</subfield><subfield code="c">US</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-384</subfield><subfield code="a">DE-91G</subfield><subfield code="a">DE-706</subfield><subfield code="a">DE-739</subfield><subfield code="a">DE-703</subfield><subfield code="a">DE-863</subfield><subfield code="a">DE-523</subfield><subfield code="a">DE-20</subfield><subfield code="a">DE-83</subfield><subfield code="a">DE-945</subfield><subfield code="a">DE-91</subfield><subfield code="a">DE-1050</subfield><subfield code="a">DE-188</subfield><subfield code="a">DE-29T</subfield><subfield code="a">DE-M347</subfield><subfield code="a">DE-473</subfield><subfield code="a">DE-Aug4</subfield><subfield code="a">DE-92</subfield><subfield code="a">DE-898</subfield><subfield code="a">DE-824</subfield><subfield code="a">DE-1028</subfield><subfield code="a">DE-860</subfield><subfield code="a">DE-1043</subfield><subfield code="a">DE-355</subfield><subfield code="a">DE-634</subfield><subfield code="a">DE-861</subfield><subfield code="a">DE-859</subfield><subfield code="a">DE-862</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">Q325.6</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.3/1</subfield><subfield code="2">23</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 285</subfield><subfield code="0">(DE-625)143648:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">QH 740</subfield><subfield code="0">(DE-625)141614:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 300</subfield><subfield code="0">(DE-625)143650:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 302</subfield><subfield code="0">(DE-625)143652:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DAT 708f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">68T05</subfield><subfield code="2">msc</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Sutton, Richard S.</subfield><subfield code="0">(DE-588)1099442435</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Reinforcement learning</subfield><subfield code="b">an introduction</subfield><subfield code="c">Richard S. Sutton and Andrew G. Barto</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">Second edition</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Cambridge, Massachusetts ; London, England</subfield><subfield code="b">The MIT Press</subfield><subfield code="c">[2018]</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">© 2018</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xxii, 526 Seiten</subfield><subfield code="b">Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Adaptive computation and machine learning</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Hier auch später erschienene, unveränderte Nachdrucke</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4033447-8</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4033447-8</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Barto, Andrew</subfield><subfield code="0">(DE-588)1099442664</subfield><subfield code="4">aut</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="d">2020</subfield><subfield code="w">(DE-604)BV047234781</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030585045&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-030585045</subfield></datafield></record></collection> |
id | DE-604.BV045195963 |
illustrated | Not Illustrated |
indexdate | 2024-11-27T04:01:08Z |
institution | BVB |
isbn | 9780262039246 |
language | English |
lccn | 018023826 |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-030585045 |
oclc_num | 1050693967 |
open_access_boolean | |
owner | DE-384 DE-91G DE-BY-TUM DE-706 DE-739 DE-703 DE-863 DE-BY-FWS DE-523 DE-20 DE-83 DE-945 DE-91 DE-BY-TUM DE-1050 DE-188 DE-29T DE-M347 DE-473 DE-BY-UBG DE-Aug4 DE-92 DE-898 DE-BY-UBR DE-824 DE-1028 DE-860 DE-1043 DE-355 DE-BY-UBR DE-634 DE-861 DE-859 DE-862 DE-BY-FWS |
owner_facet | DE-384 DE-91G DE-BY-TUM DE-706 DE-739 DE-703 DE-863 DE-BY-FWS DE-523 DE-20 DE-83 DE-945 DE-91 DE-BY-TUM DE-1050 DE-188 DE-29T DE-M347 DE-473 DE-BY-UBG DE-Aug4 DE-92 DE-898 DE-BY-UBR DE-824 DE-1028 DE-860 DE-1043 DE-355 DE-BY-UBR DE-634 DE-861 DE-859 DE-862 DE-BY-FWS |
physical | xxii, 526 Seiten Diagramme |
publishDate | 2018 |
publishDateSearch | 2018 |
publishDateSort | 2018 |
publisher | The MIT Press |
record_format | marc |
series2 | Adaptive computation and machine learning |
spellingShingle | Sutton, Richard S. Barto, Andrew Reinforcement learning an introduction Künstliche Intelligenz (DE-588)4033447-8 gnd Maschinelles Lernen (DE-588)4193754-5 gnd |
subject_GND | (DE-588)4033447-8 (DE-588)4193754-5 |
title | Reinforcement learning an introduction |
title_auth | Reinforcement learning an introduction |
title_exact_search | Reinforcement learning an introduction |
title_full | Reinforcement learning an introduction Richard S. Sutton and Andrew G. Barto |
title_fullStr | Reinforcement learning an introduction Richard S. Sutton and Andrew G. Barto |
title_full_unstemmed | Reinforcement learning an introduction Richard S. Sutton and Andrew G. Barto |
title_short | Reinforcement learning |
title_sort | reinforcement learning an introduction |
title_sub | an introduction |
topic | Künstliche Intelligenz (DE-588)4033447-8 gnd Maschinelles Lernen (DE-588)4193754-5 gnd |
topic_facet | Künstliche Intelligenz Maschinelles Lernen |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030585045&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT suttonrichards reinforcementlearninganintroduction AT bartoandrew reinforcementlearninganintroduction |
Inhaltsverzeichnis
Sonderstandort Fakultät
Signatur: |
2000 ST 302 S967(2) |
---|---|
Exemplar 1 | nicht ausleihbar Checked out – Rückgabe bis: 31.12.2099 Vormerken |
THWS Würzburg Zentralbibliothek Lesesaal
Signatur: |
1000 ST 302 S967 |
---|---|
Exemplar 1 | ausleihbar Verfügbar Bestellen |
THWS Schweinfurt Zentralbibliothek Lesesaal
Signatur: |
2000 ST 302 S967(2) |
---|---|
Exemplar 1 | ausleihbar Verfügbar Bestellen |