Reinforcement learning: an introduction
An account of key ideas and algorithms in reinforcement learning. The discussion ranges from the history of the field's intellectual foundations to recent developments and applications. Areas studied include reinforcement learning problems in terms of Markov decision problems and solution metho...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Cambridge, Mass. [u.a.]
MIT Press
1998
|
Schriftenreihe: | Adaptive computation and machine learning
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Zusammenfassung: | An account of key ideas and algorithms in reinforcement learning. The discussion ranges from the history of the field's intellectual foundations to recent developments and applications. Areas studied include reinforcement learning problems in terms of Markov decision problems and solution methods. |
Beschreibung: | Hier auch später erschienene, unveränderte Nachdrucke |
Beschreibung: | XVIII, 322 S. graph. Darst. |
ISBN: | 0262193981 9780262193986 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV012485357 | ||
003 | DE-604 | ||
005 | 20180302 | ||
007 | t | ||
008 | 990331s1998 d||| |||| 00||| eng d | ||
020 | |a 0262193981 |9 0-262-19398-1 | ||
020 | |a 9780262193986 |9 978-0-262-19398-6 | ||
035 | |a (OCoLC)263859752 | ||
035 | |a (DE-599)BVBBV012485357 | ||
040 | |a DE-604 |b ger |e rakwb | ||
041 | 0 | |a eng | |
049 | |a DE-29T |a DE-739 |a DE-91G |a DE-706 |a DE-634 |a DE-83 |a DE-11 |a DE-525 |a DE-188 |a DE-863 |a DE-355 |a DE-824 |a DE-20 |a DE-91 |a DE-523 |a DE-703 | ||
084 | |a ST 285 |0 (DE-625)143648: |2 rvk | ||
084 | |a ST 300 |0 (DE-625)143650: |2 rvk | ||
084 | |a ST 304 |0 (DE-625)143653: |2 rvk | ||
084 | |a DAT 708f |2 stub | ||
100 | 1 | |a Sutton, Richard S. |e Verfasser |0 (DE-588)1099442435 |4 aut | |
245 | 1 | 0 | |a Reinforcement learning |b an introduction |c Richard S. Sutton and Andrew G. Barto |
264 | 1 | |a Cambridge, Mass. [u.a.] |b MIT Press |c 1998 | |
300 | |a XVIII, 322 S. |b graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a Adaptive computation and machine learning | |
500 | |a Hier auch später erschienene, unveränderte Nachdrucke | ||
520 | 3 | |a An account of key ideas and algorithms in reinforcement learning. The discussion ranges from the history of the field's intellectual foundations to recent developments and applications. Areas studied include reinforcement learning problems in terms of Markov decision problems and solution methods. | |
650 | 0 | 7 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Barto, Andrew |e Verfasser |0 (DE-588)1099442664 |4 aut | |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=008474462&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-008474462 |
Datensatz im Suchindex
DE-BY-863_location | 1911 |
---|---|
DE-BY-FWS_call_number | 1911/2015:0756 |
DE-BY-FWS_katkey | 411504 |
DE-BY-FWS_media_number | 083101238598 |
_version_ | 1806176044995051520 |
adam_text | Contents
Series Foreword xiii
Preface xv
ƒ The Problem- 1
Introduction 3
1.1 Reinforcement Learning 3
1.2 Examples 6
1.3 Elements of Reinforcement Learning 7
1.4 An Extended Example: Tic-Tac-Toe 10
1.5 Summary 15
1.6 History of Reinforcement Learning 16
1.7 Bibliographical Remarks 23
Evaluative Feedback 25
2.1 An n-Armed Bandit Problem 26
2.2 Action-Value Methods 27
2.3 Softmax Action Selection 30
2,4 Evaluation Versus Instruction 31
2.5 Incremental implementation 36
Contents
viii
2.6 Tracking a Nonstationary Problem 38
2.7 Optimistic Initial Values 39
*2.8 Reinforcement Comparison 41
*2.9 Pursuit Methods 43
* 2.10 Associative Search 45
2.11 Conclusions 46
2.12 Bibliographical and Historical Remarks 48
3 The Reinforcement Learning Problem 51
3.1 The Agent-Environment Interface 51
3.2 Goals and Rewards 56
3.3 Returns 57
3.4 Unified Notation for Episodic and Continuing Tasks
* 3.5 The Markov Property 61
3.6 Markov Decision Processes 66
3.7 Value Functions 68
3.8 Optimal Value Functions 75
3.9 Optimality and Approximation 80
3.10 Summary 81
3.11 Bibliographical and Historical Remarks 83
¡1 Elementary Solution Methods 87
4 Dynamic Programming 89
4.1 Policy Evaluation 90
4.2 Policy Improvement 93
4.3 Policy Iteration 97
4.4 Value Iteration 100
4.5 Asynchronous Dynamic Programming 103
4.6 Generalized Policy Iteration 105
4.7 Efficiency of Dynamic Programming
107
Contents
4.8 Summary 108
4.9 Bibliographical and Historical Remarks 109
Monte Carlo Methods 111
5.1 Monte Carlo Policy Evaluation 112
5.2 Monte Carlo Estimation of Action Values 116
5.3 Monte Carlo Control 118
5.4 On֊Policy Monte Carlo Control 122
5.5 Evaluating One Policy While Following Another 124
5.6 Off-Policy Monte Carlo Control 126
5.7 Incremental Implementation 128
5.8 Summary 129
5.9 Bibliographical and Historical Remarks 131
Temporal-Difference Learning 133
6.1 TD Prediction 133
6.2 Advantages of TD Prediction Methods 138
6.3 Optimality of TD(0) 141
6.4 Sarsa: On-Policy TD Control 145
6.5 Q-Learning: Off-Policy TD Control 148
* 6.6 Actor-Critic Methods 151
* 6.7 R-Learning for Undiscounted Continuing Tasks 153
6.8 Games, Afterstates, and Other Special Cases 156
6.9 Summary 157
6.10 Bibliographical and Historical Remarks 158
A. Unified View 161
7 Eligibility Traces 163
7.1 ft-Step TD Prediction 164
12 The Forward View of TD(X) 169
7.3 The Backward View of TD(A) 173
X
Contents
7.4 Equivalence of Forward and Backward Views 176
7.5 Sarsa(Â) 179
7.6 Q(À) 182
*7.7 Eligibility Traces for Actor-Critic Methods 185
7.8 Replacing Traces Ï86
7.9 Implementation Issues 189
* 7.10 Variable À / 89
7.11 Conclusions 190
7.12 Bibliographical and Historical Remarks /9/
8 Generalization and Function Approximation 193
8.1 Value Prediction with Function Approximation 194
8.2 Gradient-Descent Methods /97
8.3 Linear Methods 200
8.4 Control with Function Approximation 210
8.5 Off-Policy Bootstrapping 216
8.6 Should We Bootstrap? 220
8.7 Summary 222
8.8 Bibliographical and Historical Remarks 223
9 Planning and Learning 227
9.1 Models and Planning 227
9.2 integrating Planning, Acting, and Learning 230
9.3 When the Model Is Wrong 235
9.4 Prioritized Sweeping 238
9.5 Full vs. Sample Backups 242
9.6 Trajectory Sampling 246
9.7 Heuristic Search 250
9.8 Summary 252
9.9 Bibliographical and Historical Remarks 254
10 Dimensions of Reinforcement Learning 255
10.1 The Unified View 255
10.2 Other Frontier Dimensions 258
xi Contents
11 Case Studies 261
11.1 TD-Gammon 261
11.2 Samuel’s Checkers Player
11.3 The Acrobot 270
11.4 Elevator Dispatching 274
11.5 Dynamic Channel Allocation
11.6 Job-Shop Scheduling 283
References 291
§imimairy of N©talion 313
index 315
267
279
|
any_adam_object | 1 |
author | Sutton, Richard S. Barto, Andrew |
author_GND | (DE-588)1099442435 (DE-588)1099442664 |
author_facet | Sutton, Richard S. Barto, Andrew |
author_role | aut aut |
author_sort | Sutton, Richard S. |
author_variant | r s s rs rss a b ab |
building | Verbundindex |
bvnumber | BV012485357 |
classification_rvk | ST 285 ST 300 ST 304 |
classification_tum | DAT 708f |
ctrlnum | (OCoLC)263859752 (DE-599)BVBBV012485357 |
discipline | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02042nam a2200409 c 4500</leader><controlfield tag="001">BV012485357</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20180302 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">990331s1998 d||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">0262193981</subfield><subfield code="9">0-262-19398-1</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780262193986</subfield><subfield code="9">978-0-262-19398-6</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)263859752</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV012485357</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-29T</subfield><subfield code="a">DE-739</subfield><subfield code="a">DE-91G</subfield><subfield code="a">DE-706</subfield><subfield code="a">DE-634</subfield><subfield code="a">DE-83</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-525</subfield><subfield code="a">DE-188</subfield><subfield code="a">DE-863</subfield><subfield code="a">DE-355</subfield><subfield code="a">DE-824</subfield><subfield code="a">DE-20</subfield><subfield code="a">DE-91</subfield><subfield code="a">DE-523</subfield><subfield code="a">DE-703</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 285</subfield><subfield code="0">(DE-625)143648:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 300</subfield><subfield code="0">(DE-625)143650:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 304</subfield><subfield code="0">(DE-625)143653:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DAT 708f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Sutton, Richard S.</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1099442435</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Reinforcement learning</subfield><subfield code="b">an introduction</subfield><subfield code="c">Richard S. Sutton and Andrew G. Barto</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Cambridge, Mass. [u.a.]</subfield><subfield code="b">MIT Press</subfield><subfield code="c">1998</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XVIII, 322 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Adaptive computation and machine learning</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Hier auch später erschienene, unveränderte Nachdrucke</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">An account of key ideas and algorithms in reinforcement learning. The discussion ranges from the history of the field's intellectual foundations to recent developments and applications. Areas studied include reinforcement learning problems in terms of Markov decision problems and solution methods.</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Barto, Andrew</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1099442664</subfield><subfield code="4">aut</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=008474462&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-008474462</subfield></datafield></record></collection> |
id | DE-604.BV012485357 |
illustrated | Illustrated |
indexdate | 2024-08-01T11:15:04Z |
institution | BVB |
isbn | 0262193981 9780262193986 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-008474462 |
oclc_num | 263859752 |
open_access_boolean | |
owner | DE-29T DE-739 DE-91G DE-BY-TUM DE-706 DE-634 DE-83 DE-11 DE-525 DE-188 DE-863 DE-BY-FWS DE-355 DE-BY-UBR DE-824 DE-20 DE-91 DE-BY-TUM DE-523 DE-703 |
owner_facet | DE-29T DE-739 DE-91G DE-BY-TUM DE-706 DE-634 DE-83 DE-11 DE-525 DE-188 DE-863 DE-BY-FWS DE-355 DE-BY-UBR DE-824 DE-20 DE-91 DE-BY-TUM DE-523 DE-703 |
physical | XVIII, 322 S. graph. Darst. |
publishDate | 1998 |
publishDateSearch | 1998 |
publishDateSort | 1998 |
publisher | MIT Press |
record_format | marc |
series2 | Adaptive computation and machine learning |
spellingShingle | Sutton, Richard S. Barto, Andrew Reinforcement learning an introduction Maschinelles Lernen (DE-588)4193754-5 gnd |
subject_GND | (DE-588)4193754-5 |
title | Reinforcement learning an introduction |
title_auth | Reinforcement learning an introduction |
title_exact_search | Reinforcement learning an introduction |
title_full | Reinforcement learning an introduction Richard S. Sutton and Andrew G. Barto |
title_fullStr | Reinforcement learning an introduction Richard S. Sutton and Andrew G. Barto |
title_full_unstemmed | Reinforcement learning an introduction Richard S. Sutton and Andrew G. Barto |
title_short | Reinforcement learning |
title_sort | reinforcement learning an introduction |
title_sub | an introduction |
topic | Maschinelles Lernen (DE-588)4193754-5 gnd |
topic_facet | Maschinelles Lernen |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=008474462&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT suttonrichards reinforcementlearninganintroduction AT bartoandrew reinforcementlearninganintroduction |
Inhaltsverzeichnis
THWS Würzburg Magazin
Signatur: |
1911 2015:0756 |
---|---|
Exemplar 1 | ausleihbar Verfügbar Bestellen |