Reinforcement learning and approximate dynamic programming for feedback control:
"Reinforcement learning (RL) and adaptive dynamic programming (ADP) has been one of the most critical research fields in science and engineering for modern complex systems. This book describes the latest RL and ADP techniques for decision and control in human engineered systems, covering both s...
Gespeichert in:
Weitere Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Piscataway, NJ
IEEE Press
[2013]
Hoboken, NJ Wiley |
Schriftenreihe: | IEEE Press series on computational intelligence
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Zusammenfassung: | "Reinforcement learning (RL) and adaptive dynamic programming (ADP) has been one of the most critical research fields in science and engineering for modern complex systems. This book describes the latest RL and ADP techniques for decision and control in human engineered systems, covering both single player decision and control and multi-player games. Edited by the pioneers of RL and ADP research, the book brings together ideas and methods from many fields and provides an important and timely guidance on controlling a wide variety of systems, such as robots, industrial processes, and economic decision-making"-- |
Beschreibung: | Includes bibliographical references and index |
Beschreibung: | xxvi, 613 S. Illustrationen, Diagramme 24 cm |
ISBN: | 9781118104200 111810420X |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV040755567 | ||
003 | DE-604 | ||
005 | 20200303 | ||
007 | t | ||
008 | 130218s2013 a||| |||| 00||| eng d | ||
020 | |a 9781118104200 |c hbk |9 978-1-118-10420-0 | ||
020 | |a 111810420X |9 1-118-10420-X | ||
035 | |a (OCoLC)835297866 | ||
035 | |a (DE-599)BVBBV040755567 | ||
040 | |a DE-604 |b ger | ||
041 | 0 | |a eng | |
049 | |a DE-706 |a DE-473 | ||
084 | |a ST 302 |0 (DE-625)143652: |2 rvk | ||
245 | 1 | 0 | |a Reinforcement learning and approximate dynamic programming for feedback control |c edited by Frank L. Lewis (UTA Automation and Robotics Research Institute, Fort Worth, TX), Derong Liu (University of Illinois, Chicago, IL) |
264 | 1 | |a Piscataway, NJ |b IEEE Press |c [2013] | |
264 | 1 | |a Hoboken, NJ |b Wiley | |
264 | 4 | |c © 2013 | |
300 | |a xxvi, 613 S. |b Illustrationen, Diagramme |c 24 cm | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a IEEE Press series on computational intelligence | |
500 | |a Includes bibliographical references and index | ||
520 | |a "Reinforcement learning (RL) and adaptive dynamic programming (ADP) has been one of the most critical research fields in science and engineering for modern complex systems. This book describes the latest RL and ADP techniques for decision and control in human engineered systems, covering both single player decision and control and multi-player games. Edited by the pioneers of RL and ADP research, the book brings together ideas and methods from many fields and provides an important and timely guidance on controlling a wide variety of systems, such as robots, industrial processes, and economic decision-making"-- | ||
650 | 4 | |a Reinforcement learning | |
650 | 4 | |a Feedback control systems | |
650 | 7 | |a TECHNOLOGY & ENGINEERING / Electronics / General |2 bisacsh | |
650 | 0 | 7 | |a Bestärkendes Lernen |g Künstliche Intelligenz |0 (DE-588)4825546-4 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Dynamische Optimierung |0 (DE-588)4125677-3 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Bestärkendes Lernen |g Künstliche Intelligenz |0 (DE-588)4825546-4 |D s |
689 | 0 | 1 | |a Dynamische Optimierung |0 (DE-588)4125677-3 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Lewis, Frank L. |d 1949- |0 (DE-588)130084867 |4 edt | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |z 978-1-118-45398-8 |
856 | 4 | 2 | |m Digitalisierung UB Bamberg |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025735246&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-025735246 |
Datensatz im Suchindex
_version_ | 1804150080585859072 |
---|---|
adam_text | CONTENTS
PREFACE
xix
CONTRIBUTORS
xxiii
PART I FEEDBACK CONTROL USING RL AND ADP
1.
Reinforcement Learning and Approximate Dynamic
Programming (RLADP)
—
Foundations, Common
Misconceptions, and the Challenges Ahead
3
Paul]. Werbos
1.1
Introduction
3
1.2
What is RLADP?
4
1.2.1
Definition of RLADP and the Task it Addresses
4
1.2.2
Basic Tools
—
Bellman Equation, and Value and Policy
Functions
9
1.2.3
Optimization Over Time Without Value Functions
13
1.3
Some Basic Challenges in Implementing ADP
14
1.3.1
Accounting for Unseen Variables
15
1.3.2
Offline Controller Design Versus Real-Time Learning
17
1.3.3
Model-Based Versus Model Free Designs
18
1.3.4
How to Approximate the Value Function Better
19
1.3.5
How to Choose u(t) Based on a Value Function
22
1.3.6
How to Build Cooperative Multiagent Systems with
RLADP
25
References
26
2.
Stable Adaptive Neural Control of Partially Observable
Dynamic Systems
31
J. Nate Knight and Charles W. Anderson
2.1
Introduction
31
2.2
Background
32
2.3
Stability Bias
35
2.4
Example Application
38
2.4.1
The Simulated System
38
2.4.2
An Uncertain Linear Plant Model
40
V¡
CONTENTS
2.4.3
The Closed Loop
Control
System
41
2.4.4
Determining RNN Weight Updates by Reinforcement
Learning
44
2.4.5
Results
46
2.4.6
Conclusions
50
References
50
3.
Optimal Control of Unknown Nonlinear Discrete-Time Systems
Using the Iterative Globalized Dual Heuristic Programming
Algorithm 52
Derong Liu and Ding Wang
3.1
Background Material
53
3.2
Neuro-Optimal
Control Scheme Based on the Iterative ADP
Algorithm
55
3.2.1
Identification of the Unknown Nonlinear System
55
3.2.2
Derivation of the Iterative ADP Algorithm
59
3.2.3
Convergence Analysis of the Iterative ADP Algorithm
59
3.2.4
Design Procedure of the Iterative ADP Algorithm
64
3.2.5 NN
Implementation of the Iterative ADP Algorithm Using
GDHP Technique M
3.3
Generalization
67
3.4
Simulation Studies 68
3.5
Summary
74
References
74
4.
Learning and Optimization in Hierarchical Adaptive Critic
Design 78
Haibo He, Zhen
Ni,
and Dongbin Zhao
4.1
Introduction
78
4.2
Hierarchical ADP Architecture with Multiple-Goal
Representation
80
4.2.1
System Level Structure 80
4.2.2
Architecture Design and Implementation 81
4.2.3
Learning and Adaptation in Hierarchical ADP
83
4.3
Case Study: The Ball-and-Beam System 87
4.3.1
Problem Formulation 88
4.3.2
Experiment Configuration and Parameters Setup
89
4.3.3
Simulation Results and Analysis ^
4.4
Conclusions and Future Work
94
References
95
5.
Single Network Adaptive Critics Networks
—
Development,
Analysis, and
Applications ^
Jie Ding,
Ali
Heydari, and S.N. Balakrishnan
5.1
Introduction
98
5.2
Approximate Dynamic Programing
Ю0
CONTENTS
VU
5.3
SNĄC
102
5.3.1
State
Generation
for Neural Network Training
103
5.3.2
Neural Network Training
103
5.3.3
Convergence Condition
104
5.4
J-SNAC
104
5.4.1
Neural Network Training
105
5.4.2
Numerical Analysis
105
5.5
Finite-SNAC
108
5.5.1
Neural Network Training
109
5.5.2
Convergence Theorems
111
5.5.3
Numerical Analysis
112
5.6
Conclusions
116
References
116
Linearly Solvable Optimal Control
119
K. Dvijotham and E. Todorov
6.1
Introduction
119
6.1.1
Notation
121
6.1.2
Markov Decision Processes
122
6.2
Linearly Solvable Optimal Control Problems
123
6.2.1
Probability Shift: An Alternate View of Control
123
6.2.2
Linearly Solvable Markov Decision Processes
(LMDPs)
124
6.2.3
An Alternate View of LMDPs
124
6.2.4
Other Problem Formulations
126
6.2.5
Applications
126
6.2.6
Linearly Solvable Controlled Diffusions (LDs)
127
6.2.7
Relationship Between Discrete and Continuous-Time
Problems
128
6.2.8
Historical Perspective
129
6.3
Extension to Risk-Sensitive Control and Game Theory
130
6.3.1
Game Theoretic Control: Competitive Games
130
6.3.2
Rényi
Divergence
130
6.3.3
Linearly Solvable Markov Games
130
6.3.4
Linearly Solvable Differential Games
133
6.3.5
Relationships Among the Different Formulations
134
6.4
Properties and Algorithms
134
6.4.1
Sampling Approximations and Path-Integral Control
134
6.4.2
Residual Minimization via Function Approximation
135
6.4.3
Natural Policy Gradient
136
6.4.4
Compositionality of Optimal Control Laws
136
6.4.5
Stochastic Maximum Principle
137
6.4.6
Inverse Optimal Control
138
6.5
Conclusions and Future Work
139
References
139
Vili
CONTENTS
7.
Approximating Optimal Control with Value Gradient Learning
142
Michael
Fairbank, Danii Prokhorov,
and
Eduardo
Alonso
7.1
Introduction
142
7.2
Value Gradient Learning and BPTT Algorithms
144
7.2.1
Preliminary Definitions
144
7.2.2
VGL(X) Algorithm
145
7.2.3
BPTT Algorithm
147
7.3
A Convergence Proof for VGL(l) for Control with Function
Approximation
148
7.3.1
Using a Greedy Policy with a Critic Function
149
7.3.2
The Equivalence ofVGL(l) to BPTT
151
7.3.3
Convergence Conditions
152
7.3.4
Notes on the
Ω,
Matrix
152
7.4
Vertical Lander Experiment
154
7.4.1
Problem Definition
154
7.4.2
Efficient Evaluation of the Greedy Policy
155
7.4.3
Observations on the Purpose of
Ω,
157
7.4.4
Experimental Results for Vertical Lander Problem
158
7.5
Conclusions
159
References
160
8.
A Constrained Backpropagation Approach to Function
Approximation and Approximate Dynamic Programming
162
Silvia Ferrari, Keith
R
udd,
and Gianluca
Di Muro
8.1
Background
163
8.2
Constrained Backpropagation (CPROP) Approach
163
8.2.1
Neural Network Architecture and Procedural
Memories
165
8.2.2
Derivation of LTM Equality Constraints and Adjoined Error
Gradient
165
8.2.3
Example: Incremental Function Approximation
168
8.3
Solution of Partial Differential Equations in Nonstationary
Environments
170
8.3.1
CPROP Solution of Boundary Value Problems
170
8.3.2
Example: PDE Solution on a Unit Circle
171
8.3.3
CPROP Solution to Parabolic PDEs
174
8.4
Preserving Prior Knowledge in Exploratory Adaptive Critic
Designs
174
8.4.1
Derivation of LTM Constraints for Feedback Control
175
8.4.2
Constrained Adaptive Critic Design
177
8.5
Summary
179
Appendix: Algebraic ANN Control Matrices
180
References
180
CONTENTS
ІХ
9.
Toward
Design
of Nonlinear ADP Learning Controllers with
Performance Assurance
182
Jennie Si, Lei Yang,
Chao
Lu,
Kostas
S. Tsakalis, and
Armando A. Rodríguez
9.1
Introduction
183
9.2
Direct
Heuristic Dynamic Programming
184
9.3
A
Control Theoretic View on the Direct HDP
186
9.3.1
Problem Setup
187
9.3.2
Frequency Domain Analysis of Direct HDP
189
9.3.3
Insight from Comparing Direct HDP to LQR
192
9.4
Direct HDP Design with Improved Performance
Case
1—
Design Guided by a Priori LQR Information
193
9.4.1
Direct HDP Design Guided by a Priori LQR
Information
193
9.4.2
Performance of the Direct HDP Beyond Linearization
195
9.5
Direct HDP Design with Improved Performance Case
2—
Direct
HDP for Coorindated Damping Control of Low-Frequency
Oscillation
198
9.6
Summary
201
References
202
10.
Reinforcement Learning Control with Time-Dependent Agent
Dynamics
203
Kenton Kirkpatrick and John
Valašek
10.1
Introduction
203
10.2
Q-Leaming
205
10.2.1
Q-Learning Algorithm
205
10.2.2
ε
-Greedy
207
10.2.3
Function Approximation
208
10.3
Sampled Data Q-Learning
209
10.3.1
Sampled Data Q-Learning Algorithm
209
10.3.2
Example
210
10.4
System Dynamics Approximation
213
10.4.1
First-Order Dynamics Learning
214
10.4.2
Multiagent System Thought Experiment
216
10.5
Closing Remarks
218
References
219
11.
Online Optimal Control of
Nonaffine
Nonlinear Discrete-Time
Systems without Using Value and Policy Iterations
221
Hassan Zargarzadeh, Qinmin Yang, and S. Jagannathan
11.1
Introduction
221
11.2
Background
224
11.3
Reinforcement Learning Based Control
225
11.3.1
Affine-Like Dynamics
225
11.3.2
Online Reinforcement Learning Controller Design
229
X
CONTENTS
11.3.3
The Action
NN Design 229
11.3.4
The Critic
NN
Design
230
11.3.5
Weight Updating Laws for the NNs
231
11.3.6
Main Theoretic Results
232
11.4
Time-Based Adaptive Dynamic Programming-Based Optimal
Control
234
11.4.1
Online NN-Based Identifier
235
11.4.2
Neural Network-Based Optimal Controller Design
237
11.4.3
Cost Function Approximation for Optimal Regulator
Design
238
11.4.4
Estimation of the Optimal Feedback Control Signal
240
11.4.5
Convergence Proof
242
11.4.6
Robustness
244
11.5
Simulation Result
247
11.5.1
Reinforcement-Learning-Based Control of a Nonlinear
System
247
11.5.2
The Drawback of HDP Policy Iteration Approach
250
11.5.3
OLA-Based Optimal Control Applied to HCCI Engine
251
References
255
12.
An Actor-Critic-Identifier Architecture for Adaptive
Approximate Optimal Control
258
S. Bhasin, R. Kamalapurkar, M. Johnson, K.G. Vamvoudakis,
F.L Lewis, and W.E. Dixon
12.1
Introduction
259
12.2
Actor-Critic-Identifier Architecture for
ШВ
Approximation
260
12.3
Actor-Critic Design
263
12.4
Identifier Design
264
12.5
Convergence and Stability Analysis
270
12.6
Simulation
274
12.7
Conclusion
275
References
278
13.
Robust Adaptive Dynamic Programming
281
Yu Jiang and Zhong-Ping Jiang
13.1
Introduction
281
13.2
Optimality Versus Robustness
283
13.2.1
Systems with Matched Disturbance Input
283
13.2.2
Adding One Integrator
284
13.2.3
Systems in Lower-Triangular Form
286
13.3
Robust-ADP Design for Disturbance Attenuation
288
13.3.1
Horizontal Learning
288
13.3.2
Vertical Learning
290
13.3.3
Robust-ADP Algorithm for Disturbance Attenuation
291
13.4
Robust-ADP for Partial-State Feedback Control
292
CONTENTS XI
13.4.1
The ISS
Property
293
13.4.2
Online Learning Strategy
295
13.5
Applications
296
13.5.1
Load-Frequency Control for a Power System
296
13.5.2
Machine Tool Power Drive System
298
13.6
Summary
300
References
301
PART II LEARNING AND CONTROL IN
MULTIAGENT
GAMES
14.
Hybrid Learning in Stochastic Games and Its Application
in Network Security
305
Quanyan Zhu, Hamidou Tembine, and Tamer
Basar
14.1
Introduction
305
14.1.1
Related Work
306
14.1.2
Contribution
307
14.1.3
Organization of the Chapter
308
14.2
Two-Person Game
308
14.3
Learning in NZSGs
310
14.3.1
Learning Procedures
310
14.3.2
Learning Schemes
311
14.4
Main Results
314
14.4.1
Stochastic Approximation of the Pure Learning
Schemes
314
14.4.2
Stochastic Approximation of the Hybrid Learning
Scheme
315
14.4.3
Connection with Equilibria of the Expected Game
317
14.5
Security Application
322
14.6
Conclusions and Future Works
326
Appendix: Assumptions for Stochastic Approximation
327
References
328
15.
Integral Reinforcement Learning for Online Computation of
Nash Strategies of Nonzero-Sum Differential Games
330
Dragúna
Vrabie
and F.L. Lewis
15.1
Introduction
331
15.2
Two-Player Games and Integral Reinforcement Learning
333
15.2.1
Two-Player Nonzero-Sum Games and Nash
Equilibrium
333
15.2.2
Integral Reinforcement Learning for Two-Player
Nonzero-Sum Games
335
15.3
Continuous-Time Value Iteration to Solve the Riccati Equation
337
15.4
Online Algorithm to Solve Nonzero-Sum Games
339
XU CONTENTS
15.4.1
Finding Stabilizing Gains to Initialize the Online
Algorithm
339
15.4.2
Online Partially Model-Free Algorithm for Solving the
Nonzero-Sum Differential Game
339
15.4.3
Adaptive Critic Structure for Solving the Two-Player Nash
Differential Game
340
15.5
Analysis of the Online Learning Algorithm for NZS Games
342
15.5.1
Mathematical Formulation of the Online Algorithm
342
15.6
Simulation Result for the Online Game Algorithm
345
15.7
Conclusion
347
References
348
16.
Online Learning Algorithms for Optimal Control and Dynamic
Games
350
Kyriakos G. Vamvoudakis and Frank L. Lewis
16.1
Introduction
350
16.2
Optimal Control and the Continuous Time
Hamilton-Jacobi-Bellman Equation
352
16.2.1
Optimal Control and Hamilton-Jacobi-Bellman
Equation
352
16.2.2
Policy Iteration for Optimal Control
354
16.2.3
Online Synchronous Policy Iteration
355
16.2.4
Simulation
357
16.3
Online Solution of Nonlinear Two-Player Zero-Sum Games and
Hamilton-Jacobi-Isaacs Equation
360
16.3.1
Zero-Sum Games and Hamilton-Jacobi-Isaacs
Equation
360
16.3.2
Policy Iteration for Two-Player Zero-Sum Differential
Games
361
16.3.3
Online Solution for Two-Player Zero-Sum Differential
Games
362
16.3.4
Simulation
364
16.4
Online Solution of Nonlinear Nonzero-Sum Games and Coupled
Hamilton-Jacobi Equations
366
16.4.1
Nonzero Sum Games and Coupled
Hamilton-Jacobi-Equations
367
16.4.2
Policy Iteration for Nonzero Sum Differential Games
369
16.4.3
Online Solution for Two-Player Nonzero Sum Differential
Games
370
16.4.4
Simulation
372
References
376
CONTENTS
ХІІІ
PART
ΠΙ
FOUNDATIONS IN MDP AND
RL
17.
Lambda-Policy Iteration: A Review and a New Implementation
381
Dimitri
P.
Bertsekas
17.1
Introduction
381
17.2
Lambda-Policy Iteration without Cost Function
Approximation
386
17.3
Approximate Policy Evaluation Using Projected Equations
388
17.3.1
Exploration-Contraction Trade-off
389
17.3.2
Bias
390
17.3.3
Bias-Variance Trade-off
390
17.3.4
TD Methods
391
17.3.5
Comparison of LSTD(X) and LSPE(X)
394
17.4
Lambda-Policy Iteration with Cost Function Approximation
395
17.4.1
The
LSPE(À)
Implementation
396
17.4.2
λ-ΡΙ(Ο)
—
An Implementation Based on a Discounted
MDP
397
17.4.3
λ-ΡΙ(1)
—
An Implementation Based on a Stopping
Problem
398
17.4.4
Comparison with Alternative Approximate PI Methods
404
17.4.5
Exploration-Enhanced
LSTD(À)
with Geometric
Sampling
404
17.5
Conclusions
406
References
406
18.
Optimal Learning and Approximate Dynamic Programming
410
Warren B. Powell and llya O. Ryzhov
18.1
Introduction
410
18.2
Modeling
411
18.3
The Four Classes of Policies
412
18.3.1
Myopic Cost Function Approximation
412
18.3.2
Lookahead Policies
413
18.3.3
Policy Function Approximation
414
18.3.4
Policies Based on Value Function Approximations
414
18.3.5
Learning Policies
415
18.4
Basic Learning Policies for Policy Search
416
18.4.1
The Belief Model
417
18.4.2
Objective Functions for Offline and Online Learning
418
18.4.3
Some Heuristic Policies
419
18.5
Optimal Learning Policies for Policy Search
421
18.5.1
The Knowledge Gradient for Offline Learning
421
18.5.2
The Knowledge Gradient for Correlated Beliefs
423
18.5.3
The Knowledge Gradient for Online Learning
425
XiV
CONTENTS
18.5.4 The Knowledge Gradient
for a Parametric
Belief Model
425
18.5.5
Discussion
426
18.6
Learning with a Physical State
427
18.6.1
Heuristic Policies
428
18.6.2
The Knowledge Gradient with a Physical State
428
References
429
19.
An Introduction to Event-Based Optimization: Theory
and Applications
432
Xi-Ren
Cao, Yanjia
Zhao, Qing-Shan Jia, and Qianchuan Zhao
19.1
Introduction
432
19.2
Literature Review
433
19.3
Problem Formulation
434
19.4
Policy Iteration for EBO
435
19.4.1
Performance Difference and Derivative Formulas
435
19.4.2
Policy Iteration for EBO
440
19.5
Example: Material Handling Problem
441
19.5.1
Problem Formulation
441
19.5.2
Event-Based Optimization for the Material
Handling Problem
444
19.5.3
Numerical Results
446
19.6
Conclusions
448
References
449
20.
Bounds for Markov Decision Processes
452
Vijay V. Desai, Vivek F. Farias, and Ciamac C. Moallemi
20.1
Introduction
452
20.1.1
Related Literature
454
20.2
Problem Formulation
455
20.3
The Linear Programming Approach
456
20.3.1
The Exact Linear Program
456
20.3.2
Cost-to-Go Function Approximation
457
20.3.3
The Approximate Linear Program
457
20.4
The Martingale Duality Approach
458
20.5
The Pathwise Optimization Method
461
20.6
Applications
463
20.6.1
Optimal Stopping
464
20.6.2
Linear Convex Control
467
20.7
Conclusion
470
References
471
CONTENTS
XV
21.
Approximate
Dynamic Programming and Backpropagation
on Timescales
474
John Seiffertt and Donald Wunsch
21.1
Introduction:
Timescales Fundamentals 474
21.1.1 Single-Variable
Calculus
475
21.1.2
Calculus of
Multiple Variables 476
21.1.3
Extension of the
Chain
Rule
477
21.1.4
Induction on Timescales
479
21.2
Dynamic Programming
479
21.2.1
Dynamic Programming Overview
480
21.2.2
Dynamic Programming Algorithm on Timescales
481
21.2.3
HJB Equation on Timescales
483
21.3
Backpropagation
485
21.3.1
Ordered Derivatives
486
21.3.2
The Backpropagation Algorithm on Timescales
490
21.4
Conclusions
492
References
492
22.
A Survey of Optimistic Planning in Markov Decision Processes
494
Lucian Buşoniu,
Rémi
Munos, and Robert
Babuška
22.1
Introduction
494
22.2
Optimistic Online Optimization
497
22.2.1
Bandit Problems
497
22.2.2
Lipschitz Functions and Deterministic Samples
498
22.2.3
Lipschitz Functions and Random Samples
499
22.3
Optimistic Planning Algorithms
500
22.3.1
Optimistic Planning for Deterministic Systems
502
22.3.2
Open-Loop Optimistic Planning
504
22.3.3
Optimistic Planning for Sparsely Stochastic Systems
505
22.3.4
Theoretical Guarantees
509
22.4
Related Planning Algorithms
509
22.5
Numerical Example
510
References
515
23.
Adaptive Feature Pursuit: Online Adaptation of Features
in Reinforcement Learning
517
Shalabh Bhatnagar, Vivek S. Borkar, and LA. Prashanth
23.1
Introduction
517
23.2
The Framework
520
23.2.1
The TD(0) Learning Algorithm
521
23.3
The Feature Adaptation Scheme
522
23.3.1
The Feature Adaptation Scheme
522
23.4
Convergence Analysis
525
23.5
Application to Traffic Signal Control
527
XVi
CONTENTS
23.6
Conclusions
532
References
533
24.
Feature Selection for Neuro-Dynamic Programming
535
Dayu Huang, W. Chen, P. Mehta, S. Meyn, and A. Surana
24.1
Introduction
535
24.2
Optimality Equations
536
24.2.1
Deterministic Model
537
24.2.2
Diffusion Model
538
24.2.3
Models in Discrete Time
539
24.2.4
Approximations
539
24.3
Neuro-Dynamic Algorithms
542
24.3.1
MDP Model
542
24.3.2
TD-Learning
543
24.3.3
SARSA
546
24.3.4
Q-Learning
547
24.3.5
Architecture
550
24.4
Fluid Models
551
24.4.1
The CRW Queue
551
24.4.2
Speed-Scaling Model
552
24.5
Diffusion Models
554
24.5.1
The CRW Queue
555
24.5.2
Speed-Scaling Model
556
24.6
Mean Field Games
556
24.7
Conclusions
557
References
558
25.
Approximate Dynamic Programming for Optimizing
Oil Production
560
Zheng Wen, Louis J. Durlofsky, Benjamin Van Roy, and Khalid Aziz
25.1
Introduction
560
25.2
Petroleum Reservoir Production Optimization Problem
562
25.3
Review of Dynamic Programming and Approximate
Dynamic Programming
564
25.4
Approximate Dynamic Programming Algorithm for Reservoir
Production Optimization
566
25.4.1
Basis Function Construction
566
25.4.2
Computation of Coefficients
568
25.4.3
Solving Subproblems
570
25.4.4
Adaptive Basis Function Selection and Bootstrapping
571
25.4.5
Computational Requirements
572
25.5
Simulation Results
573
25.6
Concluding Remarks
578
References
580
CONTENTS
XVII
26.
A Learning Strategy for Source Tracking in Unstructured
Environments
582
Titus Appel,
Rafael
Fierro,
Brandon
Rohrer, Ron
Lumia,
and John Wood
26.1
Introduction
582
26.2
Reinforcement Learning
583
26.2.1
Q-Learning
584
26.2.2
Q-Learning and Robotics
589
26.3
Light-Following Robot
589
26.4
Simulation Results
592
26.5
Experimental Results
595
26.5.1
Hardware
596
26.5.2
Problems in Hardware Implementation
597
26.5.3
Results
598
26.6
Conclusions and Future Work
599
References
599
INDEX
601
|
any_adam_object | 1 |
author2 | Lewis, Frank L. 1949- |
author2_role | edt |
author2_variant | f l l fl fll |
author_GND | (DE-588)130084867 |
author_facet | Lewis, Frank L. 1949- |
building | Verbundindex |
bvnumber | BV040755567 |
classification_rvk | ST 302 |
ctrlnum | (OCoLC)835297866 (DE-599)BVBBV040755567 |
discipline | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02649nam a2200457 c 4500</leader><controlfield tag="001">BV040755567</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20200303 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">130218s2013 a||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781118104200</subfield><subfield code="c">hbk</subfield><subfield code="9">978-1-118-10420-0</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">111810420X</subfield><subfield code="9">1-118-10420-X</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)835297866</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV040755567</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-706</subfield><subfield code="a">DE-473</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 302</subfield><subfield code="0">(DE-625)143652:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Reinforcement learning and approximate dynamic programming for feedback control</subfield><subfield code="c">edited by Frank L. Lewis (UTA Automation and Robotics Research Institute, Fort Worth, TX), Derong Liu (University of Illinois, Chicago, IL)</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Piscataway, NJ</subfield><subfield code="b">IEEE Press</subfield><subfield code="c">[2013]</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Hoboken, NJ</subfield><subfield code="b">Wiley</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">© 2013</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xxvi, 613 S.</subfield><subfield code="b">Illustrationen, Diagramme</subfield><subfield code="c">24 cm</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">IEEE Press series on computational intelligence</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Includes bibliographical references and index</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">"Reinforcement learning (RL) and adaptive dynamic programming (ADP) has been one of the most critical research fields in science and engineering for modern complex systems. This book describes the latest RL and ADP techniques for decision and control in human engineered systems, covering both single player decision and control and multi-player games. Edited by the pioneers of RL and ADP research, the book brings together ideas and methods from many fields and provides an important and timely guidance on controlling a wide variety of systems, such as robots, industrial processes, and economic decision-making"--</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Reinforcement learning</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Feedback control systems</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">TECHNOLOGY & ENGINEERING / Electronics / General</subfield><subfield code="2">bisacsh</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Bestärkendes Lernen</subfield><subfield code="g">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4825546-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Dynamische Optimierung</subfield><subfield code="0">(DE-588)4125677-3</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Bestärkendes Lernen</subfield><subfield code="g">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4825546-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Dynamische Optimierung</subfield><subfield code="0">(DE-588)4125677-3</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Lewis, Frank L.</subfield><subfield code="d">1949-</subfield><subfield code="0">(DE-588)130084867</subfield><subfield code="4">edt</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">978-1-118-45398-8</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Bamberg</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025735246&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-025735246</subfield></datafield></record></collection> |
id | DE-604.BV040755567 |
illustrated | Illustrated |
indexdate | 2024-07-10T00:33:14Z |
institution | BVB |
isbn | 9781118104200 111810420X |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-025735246 |
oclc_num | 835297866 |
open_access_boolean | |
owner | DE-706 DE-473 DE-BY-UBG |
owner_facet | DE-706 DE-473 DE-BY-UBG |
physical | xxvi, 613 S. Illustrationen, Diagramme 24 cm |
publishDate | 2013 |
publishDateSearch | 2013 |
publishDateSort | 2013 |
publisher | IEEE Press Wiley |
record_format | marc |
series2 | IEEE Press series on computational intelligence |
spelling | Reinforcement learning and approximate dynamic programming for feedback control edited by Frank L. Lewis (UTA Automation and Robotics Research Institute, Fort Worth, TX), Derong Liu (University of Illinois, Chicago, IL) Piscataway, NJ IEEE Press [2013] Hoboken, NJ Wiley © 2013 xxvi, 613 S. Illustrationen, Diagramme 24 cm txt rdacontent n rdamedia nc rdacarrier IEEE Press series on computational intelligence Includes bibliographical references and index "Reinforcement learning (RL) and adaptive dynamic programming (ADP) has been one of the most critical research fields in science and engineering for modern complex systems. This book describes the latest RL and ADP techniques for decision and control in human engineered systems, covering both single player decision and control and multi-player games. Edited by the pioneers of RL and ADP research, the book brings together ideas and methods from many fields and provides an important and timely guidance on controlling a wide variety of systems, such as robots, industrial processes, and economic decision-making"-- Reinforcement learning Feedback control systems TECHNOLOGY & ENGINEERING / Electronics / General bisacsh Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 gnd rswk-swf Dynamische Optimierung (DE-588)4125677-3 gnd rswk-swf Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 s Dynamische Optimierung (DE-588)4125677-3 s DE-604 Lewis, Frank L. 1949- (DE-588)130084867 edt Erscheint auch als Online-Ausgabe 978-1-118-45398-8 Digitalisierung UB Bamberg application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025735246&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Reinforcement learning and approximate dynamic programming for feedback control Reinforcement learning Feedback control systems TECHNOLOGY & ENGINEERING / Electronics / General bisacsh Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 gnd Dynamische Optimierung (DE-588)4125677-3 gnd |
subject_GND | (DE-588)4825546-4 (DE-588)4125677-3 |
title | Reinforcement learning and approximate dynamic programming for feedback control |
title_auth | Reinforcement learning and approximate dynamic programming for feedback control |
title_exact_search | Reinforcement learning and approximate dynamic programming for feedback control |
title_full | Reinforcement learning and approximate dynamic programming for feedback control edited by Frank L. Lewis (UTA Automation and Robotics Research Institute, Fort Worth, TX), Derong Liu (University of Illinois, Chicago, IL) |
title_fullStr | Reinforcement learning and approximate dynamic programming for feedback control edited by Frank L. Lewis (UTA Automation and Robotics Research Institute, Fort Worth, TX), Derong Liu (University of Illinois, Chicago, IL) |
title_full_unstemmed | Reinforcement learning and approximate dynamic programming for feedback control edited by Frank L. Lewis (UTA Automation and Robotics Research Institute, Fort Worth, TX), Derong Liu (University of Illinois, Chicago, IL) |
title_short | Reinforcement learning and approximate dynamic programming for feedback control |
title_sort | reinforcement learning and approximate dynamic programming for feedback control |
topic | Reinforcement learning Feedback control systems TECHNOLOGY & ENGINEERING / Electronics / General bisacsh Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 gnd Dynamische Optimierung (DE-588)4125677-3 gnd |
topic_facet | Reinforcement learning Feedback control systems TECHNOLOGY & ENGINEERING / Electronics / General Bestärkendes Lernen Künstliche Intelligenz Dynamische Optimierung |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025735246&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT lewisfrankl reinforcementlearningandapproximatedynamicprogrammingforfeedbackcontrol |