Reinforcement learning: state-of-the-art
Gespeichert in:
Weitere Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Heidelberg [u.a.]
Springer
2012
|
Schriftenreihe: | Adaptation, learning, and optimization
12 |
Schlagworte: | |
Online-Zugang: | Inhaltstext Inhaltsverzeichnis |
Beschreibung: | XXXIV, 638 S. Ill., graph. Darst. |
ISBN: | 364227644X 9783642276446 9783642446856 |
Internformat
MARC
LEADER | 00000nam a2200000 cb4500 | ||
---|---|---|---|
001 | BV040270437 | ||
003 | DE-604 | ||
005 | 20181002 | ||
007 | t | ||
008 | 120622s2012 ad|| |||| 00||| eng d | ||
016 | 7 | |a 1018060251 |2 DE-101 | |
020 | |a 364227644X |9 3-642-27644-X | ||
020 | |a 9783642276446 |c Geb.: EUR 255.73 |9 978-3-642-27644-6 | ||
020 | |a 9783642446856 |9 978-3-642-44685-6 | ||
035 | |a (OCoLC)794514278 | ||
035 | |a (DE-599)DNB1018060251 | ||
040 | |a DE-604 |b ger | ||
041 | 0 | |a eng | |
049 | |a DE-11 |a DE-473 |a DE-91 |a DE-384 |a DE-706 |a DE-703 | ||
082 | 0 | |a 006.31 |2 22/ger | |
084 | |a QH 740 |0 (DE-625)141614: |2 rvk | ||
084 | |a ST 300 |0 (DE-625)143650: |2 rvk | ||
084 | |a 004 |2 sdnb | ||
245 | 1 | 0 | |a Reinforcement learning |b state-of-the-art |c Marco Wiering ... (eds.) |
264 | 1 | |a Heidelberg [u.a.] |b Springer |c 2012 | |
300 | |a XXXIV, 638 S. |b Ill., graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 1 | |a Adaptation, learning, and optimization |v 12 | |
650 | 0 | 7 | |a Bestärkendes Lernen |g Künstliche Intelligenz |0 (DE-588)4825546-4 |2 gnd |9 rswk-swf |
655 | 7 | |0 (DE-588)4143413-4 |a Aufsatzsammlung |2 gnd-content | |
689 | 0 | 0 | |a Bestärkendes Lernen |g Künstliche Intelligenz |0 (DE-588)4825546-4 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Wiering, Marco |4 edt | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |z 978-3-642-27645-3 |
830 | 0 | |a Adaptation, learning, and optimization |v 12 |w (DE-604)BV036521115 |9 12 | |
856 | 4 | 2 | |m X:MVB |q text/html |u http://deposit.dnb.de/cgi-bin/dokserv?id=3940593&prov=M&dok%5Fvar=1&dok%5Fext=htm |3 Inhaltstext |
856 | 4 | 2 | |m DNB Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025126016&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-025126016 |
Datensatz im Suchindex
_version_ | 1805146880278003712 |
---|---|
adam_text |
IMAGE 1
CONTENTS
PART I INTRODUCTORY PART 1 REINFORCEMENT LEARNING AND MARKOV DECISION
PROCESSES 3
MARTIJN VAN OTTERLO, MARCO WIERING 1.1 INTRODUCTION 3
1.2 LEARNING SEQUENTIAL DECISION MAKING 5
1.3 A FORMAL FRAMEWORK 10
1.3.1 MARKOV DECISION PROCESSES 10
1.3.2 POLICIES 13
1.3.3 OPTIMALITY CRITERIA AND DISCOUNTING 13
1.4 VALUE FUNCTIONS AND BELLMAN EQUATIONS 15
1.5 SOLVING MARKOV DECISION PROCESSES 17
1.6 DYNAMIC PROGRAMMING: MODEL-BASED SOLUTION TECHNIQUES 19 1.6.1
FUNDAMENTAL D P ALGORITHMS 2 0
1.6.2 EFFICIENT D P ALGORITHMS 24
1.7 REINFORCEMENT LEARNING: MODEL-FREE SOLUTION TECHNIQUES 27 1.7.1
TEMPORAL DIFFERENCE LEARNING 29
1.7.2 MONTE CARLO METHODS 33
1.7.3 EFFICIENT EXPLORATION AND VALUE UPDATING 34
1.8 CONCLUSIONS 39
REFERENCES 39
PART II EFFICIENT SOLUTION FRAMEWORKS
2 BATCH REINFORCEMENT LEARNING 45
SASCHA LANGE, THOMAS GABEL, MARTIN RIEDMILLER 2.1 INTRODUCTION 45
2.2 THE BATCH REINFORCEMENT LEARNING PROBLEM 4 6
2.2.1 THE BATCH LEARNING PROBLEM 46
2.2.2 THE GROWING BATCH LEARNING PROBLEM 48
2.3 FOUNDATIONS O F BATCH RL ALGORITHMS 49
HTTP://D-NB.INFO/1018060251
IMAGE 2
XX
CONTENTS
2.4 BATCH RL ALGORITHMS 52
2.4.1 KERNEL-BASED APPROXIMATE DYNAMIC PROGRAMMING 53 2.4.2 FITTED Q
ITERATION 55
2.4.3 LEAST-SQUARES POLICY ITERATION 57
2.4.4 IDENTIFYING BATCH ALGORITHMS 58
2.5 THEORY O F BATCH RL 6 0
2.6 BATCH RL IN PRACTICE 61
2.6.1 NEURAL FITTED Q ITERATION (NFQ) 61
2.6.2 NFQ IN CONTROL APPLICATIONS 63
2.6.3 BATCH RL FOR LEARNING IN MULTI-AGENT SYSTEMS 65
2.6.4 DEEP FITTED Q ITERATION 67
2.6.5 APPLICATIONS/ FURTHER REFERENCES 69
2.7 SUMMARY 7 0
REFERENCES 71
3 LEAST-SQUARES METHODS FOR POLICY ITERATION 75
LUCIAN RUMANIA, ALESSANDRO LAZARIC, MOHAMMAD GHAVAMZADEH, REMI MUNOS,
ROBERT BABUSKA, BART DE SCHUTTER 3.1 INTRODUCTION 7 6
3.2 PRELIMINARIES: CLASSICAL POLICY ITERATION 77
3.3 LEAST-SQUARES METHODS FOR APPROXIMATE POLICY EVALUATION 79 3.3.1
MAIN PRINCIPLES AND TAXONOMY 7 9
3.3.2 THE LINEAR CASE AND MATRIX FORM O F THE EQUATIONS 81
3.3.3 MODEL-FREE IMPLEMENTATIONS 85
3.3.4 BIBLIOGRAPHICAL NOTES 89
3.4 ONLINE LEAST-SQUARES POLICY ITERATION 89
3.5 EXAMPLE: CAR ON THE HILL 91
3.6 PERFORMANCE GUARANTEES 9 4
3.6.1 ASYMPTOTIC CONVERGENCE AND GUARANTEES 95
3.6.2 FINITE-SAMPLE GUARANTEES 98
3.7 FURTHER READING 104
REFERENCES 106
4 LEARNING AND USING MODELS I L L
TODD HESTER, PETER STONE 4.1 INTRODUCTION 112
4.2 WHAT IS A MODEL? 113
4.3 PLANNING 115
4.3.1 MONTE CARLO METHODS 115
4.4 COMBINING MODELS AND PLANNING 118
4.5 SAMPLE COMPLEXITY 120
4.6 FACTORED DOMAINS 122
4.7 EXPLORATION 126
4.8 CONTINUOUS DOMAINS 130
4.9 EMPIRICAL COMPARISONS 133
4.10 SCALING UP 135
IMAGE 3
CONTENTS XXI
4.11 CONCLUSION 137
REFERENCES 138
5 TRANSFER IN REINFORCEMENT LEARNING: A FRAMEWORK AND A SURVEY. . 143
ALESSANDRO LAZARIC 5.1 INTRODUCTION 143
5.2 A FRAMEWORK AND A TAXONOMY FOR TRANSFER IN REINFORCEMENT LEARNING
145
5.2.1 TRANSFER FRAMEWORK 145
5.2.2 TAXONOMY 148
5.3 METHODS FOR TRANSFER FROM SOURCE TO TARGET WITH A FIXED STATE-ACTION
SPACE 155
5.3.1 PROBLEM FORMULATION 155
5.3.2 REPRESENTATION TRANSFER 156
5.3.3 PARAMETER TRANSFER 158
5.4 METHODS FOR TRANSFER ACROSS TASKS WITH A FIXED STATE-ACTION SPACE
159
5.4.1 PROBLEM FORMULATION 159
5.4.2 INSTANCE TRANSFER 160 '
5.4.3 REPRESENTATION TRANSFER 161
5.4.4 PARAMETER TRANSFER 162
5.5 METHODS FOR TRANSFER FROM SOURCE TO TARGET TASKS WITH A DIFFERENT
STATE-ACTION SPACES 164
5.5.1 PROBLEM FORMULATION 164
5.5.2 INSTANCE TRANSFER 166
5.5.3 REPRESENTATION TRANSFER 166
5.5.4 PARAMETER TRANSFER 167
5.6 CONCLUSIONS AND OPEN QUESTIONS 168
REFERENCES 169
6 SAMPLE COMPLEXITY BOUNDS OF EXPLORATION 175
LIHONG LI 6.1 INTRODUCTION 175
6.2 PRELIMINARIES 176
6.3 FORMALIZING EXPLORATION EFFICIENCY 178
6.3.1 SAMPLE COMPLEXITY OF EXPLORATION AND PAC-MDP 178 6.3.2 REGRET
MINIMIZATION 180
6.3.3 AVERAGE LOSS 182
6.3.4 BAYESIAN FRAMEWORK 183
6.4 A GENERIC PAC-MDP THEOREM 184
6.5 MODEL-BASED APPROACHES 186
6.5.1 RMAX 186
6.5.2 A GENERALIZATION O F RMAX 188
6.6 MODEL-FREE APPROACHES 196
6.7 CONCLUDING REMARKS 199
REFERENCES 200
IMAGE 4
XXII CONTENTS
PART III CONSTRUCTIVE-REPRESENTATIONAL DIRECTIONS
7 REINFORCEMENT LEARNING IN CONTINUOUS STATE AND ACTION SPACES . . . .
207 HADO VAN HASSELT 7.1 INTRODUCTION 207
7.1.1 MARKOV DECISION PROCESSES IN CONTINUOUS SPACES 208
7.1.2 METHODOLOGIES TO SOLVE A CONTINUOUS M D P 211
7.2 FUNCTION APPROXIMATION 212
7.2.1 LINEAR FUNCTION APPROXIMATION 213
7.2.2 NON-LINEAR FUNCTION APPROXIMATION 217
7.2.3 UPDATING PARAMETERS 218
7.3 APPROXIMATE REINFORCEMENT LEARNING 223
7.3.1 VALUE APPROXIMATION 223
7.3.2 POLICY APPROXIMATION 229
7.4 AN EXPERIMENT ON A DOUBLE-POLE CART POLE 238
7.5 CONCLUSION 242
REFERENCES 243
8 SOLVING RELATIONAL AND FIRST-ORDER LOGICAL MARKOV DECISION PROCESSES:
A SURVEY 253
MARTIJN VAN OTTERLO 8.1 INTRODUCTION TO SEQUENTIAL DECISIONS IN
RELATIONAL WORLDS 253
8.1.1 MDPS: REPRESENTATION AND GENERALIZATION 254
8.1.2 SHORT HISTORY AND CONNECTIONS TO OTHER FIELDS 256
8.2 EXTENDING MDPS WITH OBJECTS AND RELATIONS 257
8.2.1 RELATIONAL REPRESENTATIONS AND LOGICAL GENERALIZATION . . 257
8.2.2 RELATIONAL MARKOV DECISION PROCESSES 258
8.2.3 ABSTRACT PROBLEMS AND SOLUTIONS 259
8.3 MODEL-BASED SOLUTION TECHNIQUES 261
8.3.1 THE STRUCTURE O F BELLMAN BACKUPS 262
8.3.2 EXACT MODEL-BASED ALGORITHMS 263
8.3.3 APPROXIMATE MODEL-BASED ALGORITHMS 266
8.4 MODEL-FREE SOLUTIONS 268
8.4.1 VALUE-FUNCTION LEARNING WITH FIXED GENERALIZATION 269 8.4.2 VALUE
FUNCTIONS WITH ADAPTIVE GENERALIZATION 270
8.4.3 POLICY-BASED SOLUTION TECHNIQUES 274
8.5 MODELS, HIERARCHIES, AND BIAS 276
8.6 CURRENT DEVELOPMENTS 280
8.7 CONCLUSIONS AND OUTLOOK 283
REFERENCES 283
9 HIERARCHICAL APPROACHES 293
BERNHARD HENGST 9.1 INTRODUCTION 293
9.2 BACKGROUND 296
9.2.1 ABSTRACT ACTIONS 297
IMAGE 5
CONTENTS
XXIII
9.2.2 SEMI-MARKOV DECISION PROBLEMS 297
9.2.3 STRUCTURE 300
9.2.4 STATE ABSTRACTION 301
9.2.5 VALUE-FUNCTION DECOMPOSITION 303
9.2.6 OPTIMALITY 303
9.3 APPROACHES TO HIERARCHICAL REINFORCEMENT LEARNING (HRL) 305 9.3.1
OPTIONS 306
9.3.2 HAMQ-LEARNING 307
9.3.3 MAXQ 309
9.4 LEARNING STRUCTURE 313
9.4.1 HEXQ 315
9.5 RELATED WORK AND ONGOING RESEARCH 317
9.6 SUMMARY 319
REFERENCES 319
10 EVOLUTIONARY COMPUTATION FOR REINFORCEMENT LEARNING 325
SHIMON WHITESON 10.1 INTRODUCTION 325
10.2 NEUROEVOLUTION 328
10.3 TWEANNS 330
10.3.1 CHALLENGES 332
10.3.2 NEAT 333
10.4 HYBRIDS 334
10.4.1 EVOLUTIONARY FUNCTION APPROXIMATION ' 335
10.4.2 XCS 336
10.5 COEVOLUTION 339
10.5.1 COOPERATIVE COEVOLUTION 339
10.5.2 COMPETITIVE COEVOLUTION 342
10.6 GENERATIVE AND DEVELOPMENTAL SYSTEMS 343
10.7 ON-LINE METHODS 345
10.7.1 MODEL-BASED METHODS 345
10.7.2 ON-LINE EVOLUTIONARY COMPUTATION 346
10.8 CONCLUSION 347
REFERENCES 348
PART IV PROBABILISTIC MODELS O F SELF AND OTHERS
11 BAYESIAN REINFORCEMENT LEARNING 359
NIKOS VLASSIS, MOHAMMAD GHAVAMZADEH, SHIE MANNOR, PASCAL POUPART 11.1
INTRODUCTION 359
11.2 MODEL-FREE BAYESIAN REINFORCEMENT LEARNING 361
11.2.1 VALUE-FUNCTION BASED ALGORITHMS 361
11.2.2 POLICY GRADIENT ALGORITHMS 365
11.2.3 ACTOR-CRITIC ALGORITHMS 369
11.3 MODEL-BASED BAYESIAN REINFORCEMENT LEARNING 372
11.3.1 POMDP FORMULATION OF BAYESIAN RL 372
IMAGE 6
XXIV CONTENTS
11.3.2 BAYESIAN RL VIA DYNAMIC PROGRAMMING 373
11.3.3 APPROXIMATE ONLINE ALGORITHMS 376
11.3.4 BAYESIAN MULTI-TASK REINFORCEMENT LEARNING 377
11.3.5 INCORPORATING PRIOR KNOWLEDGE 379
11.4 FINITE SAMPLE ANALYSIS AND COMPLEXITY ISSUES 380
11.5 SUMMARY AND DISCUSSION 382
REFERENCES 382
12 PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES 387
MATTHIJS T.J. SPAAN 12.1 INTRODUCTION 387
12.2 DECISION MAKING IN PARTIALLY OBSERVABLE ENVIRONMENTS 389
12.2.1 POMDP MODEL 389
12.2.2 CONTINUOUS AND STRUCTURED REPRESENTATIONS 391
12.2.3 MEMORY FOR OPTIMAL DECISION MAKING 391
12.2.4 POLICIES AND VALUE FUNCTIONS 394
12.3 MODEL-BASED TECHNIQUES R 395
12.3.1 HEURISTICS BASED ON M D P SOLUTIONS 396
12.3.2 VALUE ITERATION FOR POMDPS 397
12.3.3 EXACT VALUE ITERATION 400
12.3.4 POINT-BASED VALUE ITERATION METHODS 401
12.3.5 OTHER APPROXIMATE METHODS 403
12.4 DECISION MAKING WITHOUT A-PRIORI MODELS 404
12.4.1 MEMORYLESS TECHNIQUES 405
12.4.2 LEARNING INTERNAL MEMORY 405
12.5 RECENT TRENDS 408
REFERENCES 409
13 PREDICTIVELY DEFINED REPRESENTATIONS O F STATE 415
DAVID WINGATE 13.1 INTRODUCTION 416
13.1.1 WHAT IS "STATE"? 416
13.1.2 WHICH REPRESENTATION OF STATE? 418
13.1.3 WHY PREDICTIONS ABOUT THE FUTURE? 419
13.2 PSRS 420
13.2.1 HISTORIES AND TESTS 421
13.2.2 PREDICTION OF A TEST 422
13.2.3 THE SYSTEM DYNAMICS VECTOR 422
13.2.4 THE SYSTEM DYNAMICS MATRIX 423
13.2.5 SUFFICIENT STATISTICS 424
13.2.6 STATE 424
13.2.7 STATE UPDATE 425
13.2.8 LINEAR PSRS 425
13.2.9 RELATING LINEAR PSRS TO POMDPS 426
13.2.10 THEORETICAL RESULTS ON LINEAR PSRS 427
13.3 LEARNING A PSR MODEL 428
IMAGE 7
CONTENTS XXV
13.3.1 THE DISCOVERY PROBLEM 428
13.3.2 THE LEARNING PROBLEM 429
13.3.3 ESTIMATING THE SYSTEM DYNAMICS MATRIX 429
13.4 PLANNING WITH PSRS 429
13.5 EXTENSIONS OF PSRS 431
13.6 OTHER MODELS WITH PREDICTIVELY DEFINED STATE 432
13.6.1 OBSERVABLE OPERATOR MODELS 433
13.6.2 THE PREDICTIVE LINEAR-GAUSSIAN MODEL 433
13.6.3 TEMPORAL-DIFFERENCE NETWORKS 434
13.6.4 DIVERSITY AUTOMATON 435
13.6.5 THE EXPONENTIAL FAMILY PSR 435
13.6.6 TRANSFORMED PSRS 436
13.7 CONCLUSION 436
REFERENCES 437
14 GAME THEORY AND MULTI-AGENT REINFORCEMENT LEARNING 441
ANN NOWE, PETER VRANCX, YANN-MICHAEL DE HAUWERE 14.1 INTRODUCTION 441
14.2 REPEATED GAMES 445
14.2.1 GAME THEORY 445
14.2.2 REINFORCEMENT LEARNING IN REPEATED GAMES 449
14.3 SEQUENTIAL GAMES 454
14.3.1 MARKOV GAMES 455
14.3.2 REINFORCEMENT LEARNING IN MARKOV GAMES 456
14.4 SPARSE INTERACTIONS IN MULTI-AGENT SYSTEM 461
14.4.1 LEARNING ON MULTIPLE LEVELS 461
14.4.2 LEARNING TO COORDINATE WITH SPARSE INTERACTIONS 462
14.5 FURTHER READING 467
REFERENCES 467
15 DECENTRALIZED POMDPS 471
FRANS A. OLIEHOEK 15.1 INTRODUCTION 471
15.2 THE DECENTRALIZED POMDP FRAMEWORK 473
15.3 HISTORIES AND POLICIES 475
15.3.1 HISTORIES 475
15.3.2 POLICIES 476
15.3.3 STRUCTURE IN POLICIES 477
15.3.4 THE QUALITY O F JOINT POLICIES 479
15.4 SOLUTION O F FINITE-HORIZON DEC-POMDPS 480
15.4.1 BRUTE FORCE SEARCH AND DEC-POMDP COMPLEXITY 480 15.4.2
ALTERNATING MAXIMIZATION 481
15.4.3 OPTIMAL VALUE FUNCTIONS FOR DEC-POMDPS 481
15.4.4 FORWARD APPROACH: HEURISTIC SEARCH 485
15.4.5 BACKWARDS APPROACH: DYNAMIC PROGRAMMING 489
15.4.6 OTHER FINITE-HORIZON METHODS 493
IMAGE 8
XXVI CONTENTS
15.5 FURTHER TOPICS 493
15.5.1 GENERALIZATION AND SPECIAL CASES 493
15.5.2 INFINITE-HORIZON DEC-POMDPS 495
15.5.3 REINFORCEMENT LEARNING 496
15.5.4 COMMUNICATION 497
REFERENCES 498
PART V DOMAINS AND BACKGROUND
16 PSYCHOLOGICAL AND NEUROSCIENTIFIC CONNECTIONS WITH REINFORCEMENT
LEARNING 507
ASHVIN SHAH 16.1 INTRODUCTION 507
16.2 CLASSICAL (OR PAVLOVIAN) CONDITIONING 508
16.2.1 BEHAVIOR 509
16.2.2 THEORY 511
16.2.3 SUMMARY AND ADDITIONAL CONSIDERATIONS 512
16.3 OPERANT (OR INSTRUMENTAL) CONDITIONING 513
16.3.1 BEHAVIOR 513
16.3.2 THEORY 514
16.3.3 MODEL-BASED VERSUS MODEL-FREE CONTROL 516
16.3.4 SUMMARY AND ADDITIONAL CONSIDERATIONS 517
16.4 DOPAMINE 518
16.4.1 DOPAMINE AS A REWARD PREDICTION ERROR 518
16.4.2 DOPAMINE AS A GENERAL REINFORCEMENT SIGNAL 520
16.4.3 SUMMARY AND ADDITIONAL CONSIDERATIONS 521
16.5 THE BASAL GANGLIA 521
16.5.1 OVERVIEW O F THE BASAL GANGLIA 522
16.5.2 NEURAL ACTIVITY IN THE STRIATUM 523
16.5.3 CORTICO-BASAL GANGLIA-THALAMIC LOOPS 524
16.5.4 SUMMARY AND ADDITIONAL CONSIDERATIONS 526
16.6 CHAPTER SUMMARY 527
REFERENCES 528
17 REINFORCEMENT LEARNING IN GAMES 539
ISTVDN SZITA 17.1 INTRODUCTION 539
17.1.1 AIMS AND STRUCTURE 540
17.1.2 SCOPE 541
17.2 A SHOWCASE O F GAMES 541
17.2.1 BACKGAMMON 542
17.2.2 CHESS 545
17.2.3 GO 550
17.2.4 TETRIS 555
17.2.5 REAL-TIME STRATEGY GAMES 558
17.3 CHALLENGES O F APPLYING REINFORCEMENT LEARNING TO GAMES 561
IMAGE 9
CONTENTS XXVII
17.3.1 REPRESENTATION DESIGN 561
17.3.2 EXPLORATION 564
17.3.3 SOURCE OF TRAINING DATA 565
17.3.4 DEALING WITH MISSING INFORMATION 566
17.3.5 OPPONENT MODELLING 567
17.4 USING RL IN GAMES 568
17.4.1 OPPONENTS THAT MAXIMIZE FUN 568
17.4.2 DEVELOPMENT-TIME LEARNING 570
17.5 CLOSING REMARKS 571
REFERENCES 572
18 REINFORCEMENT LEARNING IN ROBOTICS: A SURVEY 579
JENS KOBER, JAN PETERS 18.1 INTRODUCTION 579
18.2 CHALLENGES IN ROBOT REINFORCEMENT LEARNING 581
18.2.1 CURSE O F DIMENSIONALITY 582
18.2.2 CURSE O F REAL-WORLD SAMPLES 583
18.2.3 CURSE O F REAL-WORLD INTERACTIONS 584
18.2.4 CURSE O F MODEL ERRORS 584
18.2.5 CURSE O F GOAL SPECIFICATION 585
18.3 FOUNDATIONS OF ROBOT REINFORCEMENT LEARNING 585
18.3.1 VALUE FUNCTION APPROACHES 586
18.3.2 POLICY SEARCH 588
18.4 TRACTABILITY THROUGH REPRESENTATION 589
18.4.1 SMART STATE-ACTION DISCRETIZATION 590
18.4.2 FUNCTION APPROXIMATION 592
18.4.3 PRE-STRUCTURED POLICIES 592
18.5 TRACTABILITY THROUGH PRIOR KNOWLEDGE 594
18.5.1 PRIOR KNOWLEDGE THROUGH DEMONSTRATIONS 594
18.5.2 PRIOR KNOWLEDGE THROUGH TASK STRUCTURING 596
18.5.3 DIRECTING EXPLORATION WITH PRIOR KNOWLEDGE 596
18.6 TRACTABILITY THROUGH SIMULATION 596
18.6.1 ROLE O F MODELS 597
18.6.2 MENTAL REHEARSAL 598
18.6.3 DIRECT TRANSFER FROM SIMULATED TO REAL ROBOTS 599
18.7 A CASE STUDY: BALL-IN-A-CUP 599
18.7.1 EXPERIMENTAL SETTING: TASK AND REWARD 599
18.7.2 APPROPRIATE POLICY REPRESENTATION 601
18.7.3 GENERATING A TEACHER'S DEMONSTRATION 601
18.7.4 REINFORCEMENT LEARNING BY POLICY SEARCH 601
18.7.5 USE O F SIMULATIONS IN ROBOT REINFORCEMENT LEARNING . . . 603
18.7.6 ALTERNATIVE APPROACH WITH VALUE FUNCTION METHODS 603 18.8
CONCLUSION 603
REFERENCES 604
IMAGE 10
XXVIII
CONTENTS
PART VI CLOSING
19 CONCLUSIONS, FUTURE DIRECTIONS AND OUTLOOK 613
MARCO WIERING, MARTIJN VAN OTTERLO 19.1 LOOKINGBACK 613
19.1.1 WHAT HAS BEEN ACCOMPLISHED? 613
19.1.2 WHICH TOPICS WERE NOT INCLUDED? 614
19.2 LOOKING INTO THE FUTURE 620
19.2.1 THINGS THAT ARE NOT YET KNOWN 620
19.2.2 SEEMINGLY IMPOSSIBLE APPLICATIONS FOR RL 622
19.2.3 INTERESTING DIRECTIONS 623
19.2.4 EXPERTS ON FUTURE DEVELOPMENTS 624
REFERENCES 626
INDEX 631 |
any_adam_object | 1 |
author2 | Wiering, Marco |
author2_role | edt |
author2_variant | m w mw |
author_facet | Wiering, Marco |
building | Verbundindex |
bvnumber | BV040270437 |
classification_rvk | QH 740 ST 300 |
ctrlnum | (OCoLC)794514278 (DE-599)DNB1018060251 |
dewey-full | 006.31 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 006 - Special computer methods |
dewey-raw | 006.31 |
dewey-search | 006.31 |
dewey-sort | 16.31 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik Wirtschaftswissenschaften |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 cb4500</leader><controlfield tag="001">BV040270437</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20181002</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">120622s2012 ad|| |||| 00||| eng d</controlfield><datafield tag="016" ind1="7" ind2=" "><subfield code="a">1018060251</subfield><subfield code="2">DE-101</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">364227644X</subfield><subfield code="9">3-642-27644-X</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9783642276446</subfield><subfield code="c">Geb.: EUR 255.73</subfield><subfield code="9">978-3-642-27644-6</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9783642446856</subfield><subfield code="9">978-3-642-44685-6</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)794514278</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)DNB1018060251</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-11</subfield><subfield code="a">DE-473</subfield><subfield code="a">DE-91</subfield><subfield code="a">DE-384</subfield><subfield code="a">DE-706</subfield><subfield code="a">DE-703</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.31</subfield><subfield code="2">22/ger</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">QH 740</subfield><subfield code="0">(DE-625)141614:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 300</subfield><subfield code="0">(DE-625)143650:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">004</subfield><subfield code="2">sdnb</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Reinforcement learning</subfield><subfield code="b">state-of-the-art</subfield><subfield code="c">Marco Wiering ... (eds.)</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Heidelberg [u.a.]</subfield><subfield code="b">Springer</subfield><subfield code="c">2012</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XXXIV, 638 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Adaptation, learning, and optimization</subfield><subfield code="v">12</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Bestärkendes Lernen</subfield><subfield code="g">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4825546-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="655" ind1=" " ind2="7"><subfield code="0">(DE-588)4143413-4</subfield><subfield code="a">Aufsatzsammlung</subfield><subfield code="2">gnd-content</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Bestärkendes Lernen</subfield><subfield code="g">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4825546-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Wiering, Marco</subfield><subfield code="4">edt</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">978-3-642-27645-3</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">Adaptation, learning, and optimization</subfield><subfield code="v">12</subfield><subfield code="w">(DE-604)BV036521115</subfield><subfield code="9">12</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">X:MVB</subfield><subfield code="q">text/html</subfield><subfield code="u">http://deposit.dnb.de/cgi-bin/dokserv?id=3940593&prov=M&dok%5Fvar=1&dok%5Fext=htm</subfield><subfield code="3">Inhaltstext</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">DNB Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025126016&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-025126016</subfield></datafield></record></collection> |
genre | (DE-588)4143413-4 Aufsatzsammlung gnd-content |
genre_facet | Aufsatzsammlung |
id | DE-604.BV040270437 |
illustrated | Illustrated |
indexdate | 2024-07-21T00:36:55Z |
institution | BVB |
isbn | 364227644X 9783642276446 9783642446856 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-025126016 |
oclc_num | 794514278 |
open_access_boolean | |
owner | DE-11 DE-473 DE-BY-UBG DE-91 DE-BY-TUM DE-384 DE-706 DE-703 |
owner_facet | DE-11 DE-473 DE-BY-UBG DE-91 DE-BY-TUM DE-384 DE-706 DE-703 |
physical | XXXIV, 638 S. Ill., graph. Darst. |
publishDate | 2012 |
publishDateSearch | 2012 |
publishDateSort | 2012 |
publisher | Springer |
record_format | marc |
series | Adaptation, learning, and optimization |
series2 | Adaptation, learning, and optimization |
spelling | Reinforcement learning state-of-the-art Marco Wiering ... (eds.) Heidelberg [u.a.] Springer 2012 XXXIV, 638 S. Ill., graph. Darst. txt rdacontent n rdamedia nc rdacarrier Adaptation, learning, and optimization 12 Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 gnd rswk-swf (DE-588)4143413-4 Aufsatzsammlung gnd-content Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 s DE-604 Wiering, Marco edt Erscheint auch als Online-Ausgabe 978-3-642-27645-3 Adaptation, learning, and optimization 12 (DE-604)BV036521115 12 X:MVB text/html http://deposit.dnb.de/cgi-bin/dokserv?id=3940593&prov=M&dok%5Fvar=1&dok%5Fext=htm Inhaltstext DNB Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025126016&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Reinforcement learning state-of-the-art Adaptation, learning, and optimization Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 gnd |
subject_GND | (DE-588)4825546-4 (DE-588)4143413-4 |
title | Reinforcement learning state-of-the-art |
title_auth | Reinforcement learning state-of-the-art |
title_exact_search | Reinforcement learning state-of-the-art |
title_full | Reinforcement learning state-of-the-art Marco Wiering ... (eds.) |
title_fullStr | Reinforcement learning state-of-the-art Marco Wiering ... (eds.) |
title_full_unstemmed | Reinforcement learning state-of-the-art Marco Wiering ... (eds.) |
title_short | Reinforcement learning |
title_sort | reinforcement learning state of the art |
title_sub | state-of-the-art |
topic | Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 gnd |
topic_facet | Bestärkendes Lernen Künstliche Intelligenz Aufsatzsammlung |
url | http://deposit.dnb.de/cgi-bin/dokserv?id=3940593&prov=M&dok%5Fvar=1&dok%5Fext=htm http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025126016&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
volume_link | (DE-604)BV036521115 |
work_keys_str_mv | AT wieringmarco reinforcementlearningstateoftheart |