Verfügbarkeit: Reinforcement learning

Reinforcement learning: an introduction

An account of key ideas and algorithms in reinforcement learning. The discussion ranges from the history of the field's intellectual foundations to recent developments and applications. Areas studied include reinforcement learning problems in terms of Markov decision problems and solution metho...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Sutton, Richard S. (VerfasserIn), Barto, Andrew (VerfasserIn)
Format:	Buch
Sprache:	English
Veröffentlicht:	Cambridge, Mass. [u.a.] MIT Press 1998
Schriftenreihe:	Adaptive computation and machine learning
Schlagworte:	Maschinelles Lernen
Online-Zugang:	Inhaltsverzeichnis
Zusammenfassung:	An account of key ideas and algorithms in reinforcement learning. The discussion ranges from the history of the field's intellectual foundations to recent developments and applications. Areas studied include reinforcement learning problems in terms of Markov decision problems and solution methods.
Beschreibung:	Hier auch später erschienene, unveränderte Nachdrucke
Beschreibung:	XVIII, 322 S. graph. Darst.
ISBN:	0262193981 9780262193986

Internformat

MARC


LEADER	00000nam a2200000 c 4500
001	BV012485357
003	DE-604
005	20180302
007	t
008	990331s1998 d\|\|\| \|\|\|\| 00\|\|\| eng d
020			\|a 0262193981 \|9 0-262-19398-1
020			\|a 9780262193986 \|9 978-0-262-19398-6
035			\|a (OCoLC)263859752
035			\|a (DE-599)BVBBV012485357
040			\|a DE-604 \|b ger \|e rakwb
041	0		\|a eng
049			\|a DE-29T \|a DE-739 \|a DE-91G \|a DE-706 \|a DE-634 \|a DE-83 \|a DE-11 \|a DE-525 \|a DE-188 \|a DE-863 \|a DE-355 \|a DE-824 \|a DE-20 \|a DE-91 \|a DE-523 \|a DE-703
084			\|a ST 285 \|0 (DE-625)143648: \|2 rvk
084			\|a ST 300 \|0 (DE-625)143650: \|2 rvk
084			\|a ST 304 \|0 (DE-625)143653: \|2 rvk
084			\|a DAT 708f \|2 stub
100	1		\|a Sutton, Richard S. \|e Verfasser \|0 (DE-588)1099442435 \|4 aut
245	1	0	\|a Reinforcement learning \|b an introduction \|c Richard S. Sutton and Andrew G. Barto
264		1	\|a Cambridge, Mass. [u.a.] \|b MIT Press \|c 1998
300			\|a XVIII, 322 S. \|b graph. Darst.
336			\|b txt \|2 rdacontent
337			\|b n \|2 rdamedia
338			\|b nc \|2 rdacarrier
490	0		\|a Adaptive computation and machine learning
500			\|a Hier auch später erschienene, unveränderte Nachdrucke
520	3		\|a An account of key ideas and algorithms in reinforcement learning. The discussion ranges from the history of the field's intellectual foundations to recent developments and applications. Areas studied include reinforcement learning problems in terms of Markov decision problems and solution methods.
650	0	7	\|a Maschinelles Lernen \|0 (DE-588)4193754-5 \|2 gnd \|9 rswk-swf
689	0	0	\|a Maschinelles Lernen \|0 (DE-588)4193754-5 \|D s
689	0		\|5 DE-604
700	1		\|a Barto, Andrew \|e Verfasser \|0 (DE-588)1099442664 \|4 aut
856	4	2	\|m Digitalisierung UB Passau - ADAM Catalogue Enrichment \|q application/pdf \|u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=008474462&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA \|3 Inhaltsverzeichnis
999			\|a oai:aleph.bib-bvb.de:BVB01-008474462

Datensatz im Suchindex

DE-BY-863_location	1911
DE-BY-FWS_call_number	1911/2015:0756
DE-BY-FWS_katkey	411504
DE-BY-FWS_media_number	083101238598
_version_	1806176044995051520
adam_text	Contents Series Foreword xiii Preface xv ƒ The Problem- 1 Introduction 3 1.1 Reinforcement Learning 3 1.2 Examples 6 1.3 Elements of Reinforcement Learning 7 1.4 An Extended Example: Tic-Tac-Toe 10 1.5 Summary 15 1.6 History of Reinforcement Learning 16 1.7 Bibliographical Remarks 23 Evaluative Feedback 25 2.1 An n-Armed Bandit Problem 26 2.2 Action-Value Methods 27 2.3 Softmax Action Selection 30 2,4 Evaluation Versus Instruction 31 2.5 Incremental implementation 36 Contents viii 2.6 Tracking a Nonstationary Problem 38 2.7 Optimistic Initial Values 39 2.8 Reinforcement Comparison 41 2.9 Pursuit Methods 43 * 2.10 Associative Search 45 2.11 Conclusions 46 2.12 Bibliographical and Historical Remarks 48 3 The Reinforcement Learning Problem 51 3.1 The Agent-Environment Interface 51 3.2 Goals and Rewards 56 3.3 Returns 57 3.4 Unified Notation for Episodic and Continuing Tasks * 3.5 The Markov Property 61 3.6 Markov Decision Processes 66 3.7 Value Functions 68 3.8 Optimal Value Functions 75 3.9 Optimality and Approximation 80 3.10 Summary 81 3.11 Bibliographical and Historical Remarks 83 ¡1 Elementary Solution Methods 87 4 Dynamic Programming 89 4.1 Policy Evaluation 90 4.2 Policy Improvement 93 4.3 Policy Iteration 97 4.4 Value Iteration 100 4.5 Asynchronous Dynamic Programming 103 4.6 Generalized Policy Iteration 105 4.7 Efficiency of Dynamic Programming 107 Contents 4.8 Summary 108 4.9 Bibliographical and Historical Remarks 109 Monte Carlo Methods 111 5.1 Monte Carlo Policy Evaluation 112 5.2 Monte Carlo Estimation of Action Values 116 5.3 Monte Carlo Control 118 5.4 On֊Policy Monte Carlo Control 122 5.5 Evaluating One Policy While Following Another 124 5.6 Off-Policy Monte Carlo Control 126 5.7 Incremental Implementation 128 5.8 Summary 129 5.9 Bibliographical and Historical Remarks 131 Temporal-Difference Learning 133 6.1 TD Prediction 133 6.2 Advantages of TD Prediction Methods 138 6.3 Optimality of TD(0) 141 6.4 Sarsa: On-Policy TD Control 145 6.5 Q-Learning: Off-Policy TD Control 148 * 6.6 Actor-Critic Methods 151 * 6.7 R-Learning for Undiscounted Continuing Tasks 153 6.8 Games, Afterstates, and Other Special Cases 156 6.9 Summary 157 6.10 Bibliographical and Historical Remarks 158 A. Unified View 161 7 Eligibility Traces 163 7.1 ft-Step TD Prediction 164 12 The Forward View of TD(X) 169 7.3 The Backward View of TD(A) 173 X Contents 7.4 Equivalence of Forward and Backward Views 176 7.5 Sarsa(Â) 179 7.6 Q(À) 182 7.7 Eligibility Traces for Actor-Critic Methods 185 7.8 Replacing Traces Ï86 7.9 Implementation Issues 189 7.10 Variable À / 89 7.11 Conclusions 190 7.12 Bibliographical and Historical Remarks /9/ 8 Generalization and Function Approximation 193 8.1 Value Prediction with Function Approximation 194 8.2 Gradient-Descent Methods /97 8.3 Linear Methods 200 8.4 Control with Function Approximation 210 8.5 Off-Policy Bootstrapping 216 8.6 Should We Bootstrap? 220 8.7 Summary 222 8.8 Bibliographical and Historical Remarks 223 9 Planning and Learning 227 9.1 Models and Planning 227 9.2 integrating Planning, Acting, and Learning 230 9.3 When the Model Is Wrong 235 9.4 Prioritized Sweeping 238 9.5 Full vs. Sample Backups 242 9.6 Trajectory Sampling 246 9.7 Heuristic Search 250 9.8 Summary 252 9.9 Bibliographical and Historical Remarks 254 10 Dimensions of Reinforcement Learning 255 10.1 The Unified View 255 10.2 Other Frontier Dimensions 258 xi Contents 11 Case Studies 261 11.1 TD-Gammon 261 11.2 Samuel’s Checkers Player 11.3 The Acrobot 270 11.4 Elevator Dispatching 274 11.5 Dynamic Channel Allocation 11.6 Job-Shop Scheduling 283 References 291 §imimairy of N©talion 313 index 315 267 279
any_adam_object	1
author	Sutton, Richard S. Barto, Andrew
author_GND	(DE-588)1099442435 (DE-588)1099442664
author_facet	Sutton, Richard S. Barto, Andrew
author_role	aut aut
author_sort	Sutton, Richard S.
author_variant	r s s rs rss a b ab
building	Verbundindex
bvnumber	BV012485357
classification_rvk	ST 285 ST 300 ST 304
classification_tum	DAT 708f
ctrlnum	(OCoLC)263859752 (DE-599)BVBBV012485357
discipline	Informatik
format	Book
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02042nam a2200409 c 4500</leader><controlfield tag="001">BV012485357</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20180302 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">990331s1998 d\|\|\| \|\|\|\| 00\|\|\| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">0262193981</subfield><subfield code="9">0-262-19398-1</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780262193986</subfield><subfield code="9">978-0-262-19398-6</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)263859752</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV012485357</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-29T</subfield><subfield code="a">DE-739</subfield><subfield code="a">DE-91G</subfield><subfield code="a">DE-706</subfield><subfield code="a">DE-634</subfield><subfield code="a">DE-83</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-525</subfield><subfield code="a">DE-188</subfield><subfield code="a">DE-863</subfield><subfield code="a">DE-355</subfield><subfield code="a">DE-824</subfield><subfield code="a">DE-20</subfield><subfield code="a">DE-91</subfield><subfield code="a">DE-523</subfield><subfield code="a">DE-703</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 285</subfield><subfield code="0">(DE-625)143648:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 300</subfield><subfield code="0">(DE-625)143650:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 304</subfield><subfield code="0">(DE-625)143653:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DAT 708f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Sutton, Richard S.</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1099442435</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Reinforcement learning</subfield><subfield code="b">an introduction</subfield><subfield code="c">Richard S. Sutton and Andrew G. Barto</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Cambridge, Mass. [u.a.]</subfield><subfield code="b">MIT Press</subfield><subfield code="c">1998</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XVIII, 322 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Adaptive computation and machine learning</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Hier auch später erschienene, unveränderte Nachdrucke</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">An account of key ideas and algorithms in reinforcement learning. The discussion ranges from the history of the field's intellectual foundations to recent developments and applications. Areas studied include reinforcement learning problems in terms of Markov decision problems and solution methods.</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Barto, Andrew</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1099442664</subfield><subfield code="4">aut</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=008474462&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-008474462</subfield></datafield></record></collection>
id	DE-604.BV012485357
illustrated	Illustrated
indexdate	2024-08-01T11:15:04Z
institution	BVB
isbn	0262193981 9780262193986
language	English
oai_aleph_id	oai:aleph.bib-bvb.de:BVB01-008474462
oclc_num	263859752
open_access_boolean
owner	DE-29T DE-739 DE-91G DE-BY-TUM DE-706 DE-634 DE-83 DE-11 DE-525 DE-188 DE-863 DE-BY-FWS DE-355 DE-BY-UBR DE-824 DE-20 DE-91 DE-BY-TUM DE-523 DE-703
owner_facet	DE-29T DE-739 DE-91G DE-BY-TUM DE-706 DE-634 DE-83 DE-11 DE-525 DE-188 DE-863 DE-BY-FWS DE-355 DE-BY-UBR DE-824 DE-20 DE-91 DE-BY-TUM DE-523 DE-703
physical	XVIII, 322 S. graph. Darst.
publishDate	1998
publishDateSearch	1998
publishDateSort	1998
publisher	MIT Press
record_format	marc
series2	Adaptive computation and machine learning
spellingShingle	Sutton, Richard S. Barto, Andrew Reinforcement learning an introduction Maschinelles Lernen (DE-588)4193754-5 gnd
subject_GND	(DE-588)4193754-5
title	Reinforcement learning an introduction
title_auth	Reinforcement learning an introduction
title_exact_search	Reinforcement learning an introduction
title_full	Reinforcement learning an introduction Richard S. Sutton and Andrew G. Barto
title_fullStr	Reinforcement learning an introduction Richard S. Sutton and Andrew G. Barto
title_full_unstemmed	Reinforcement learning an introduction Richard S. Sutton and Andrew G. Barto
title_short	Reinforcement learning
title_sort	reinforcement learning an introduction
title_sub	an introduction
topic	Maschinelles Lernen (DE-588)4193754-5 gnd
topic_facet	Maschinelles Lernen
url	http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=008474462&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA
work_keys_str_mv	AT suttonrichards reinforcementlearninganintroduction AT bartoandrew reinforcementlearninganintroduction

Verfügbarkeit

Inhaltsverzeichnis

THWS Würzburg Magazin

Bestandesangaben von THWS Würzburg Magazin
Signatur:	1911 2015:0756
Exemplar 1	ausleihbar Verfügbar Bestellen

MARC

Datensatz im Suchindex

THWS Würzburg Magazin

Ähnliche Einträge