Verfügbarkeit: Reinforcement learning and approximate dynamic programming for feedback control

Reinforcement learning and approximate dynamic programming for feedback control:

"Reinforcement learning (RL) and adaptive dynamic programming (ADP) has been one of the most critical research fields in science and engineering for modern complex systems. This book describes the latest RL and ADP techniques for decision and control in human engineered systems, covering both s...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Weitere Verfasser:	Lewis, Frank L. 1949- (HerausgeberIn)
Format:	Buch
Sprache:	English
Veröffentlicht:	Piscataway, NJ IEEE Press [2013] Hoboken, NJ Wiley
Schriftenreihe:	IEEE Press series on computational intelligence
Schlagworte:	Reinforcement learning Feedback control systems TECHNOLOGY & ENGINEERING / Electronics / General Bestärkendes Lernen > Künstliche Intelligenz Dynamische Optimierung
Online-Zugang:	Inhaltsverzeichnis
Zusammenfassung:	"Reinforcement learning (RL) and adaptive dynamic programming (ADP) has been one of the most critical research fields in science and engineering for modern complex systems. This book describes the latest RL and ADP techniques for decision and control in human engineered systems, covering both single player decision and control and multi-player games. Edited by the pioneers of RL and ADP research, the book brings together ideas and methods from many fields and provides an important and timely guidance on controlling a wide variety of systems, such as robots, industrial processes, and economic decision-making"--
Beschreibung:	Includes bibliographical references and index
Beschreibung:	xxvi, 613 S. Illustrationen, Diagramme 24 cm
ISBN:	9781118104200 111810420X

Internformat

MARC


LEADER	00000nam a2200000 c 4500
001	BV040755567
003	DE-604
005	20200303
007	t
008	130218s2013 a\|\|\| \|\|\|\| 00\|\|\| eng d
020			\|a 9781118104200 \|c hbk \|9 978-1-118-10420-0
020			\|a 111810420X \|9 1-118-10420-X
035			\|a (OCoLC)835297866
035			\|a (DE-599)BVBBV040755567
040			\|a DE-604 \|b ger
041	0		\|a eng
049			\|a DE-706 \|a DE-473
084			\|a ST 302 \|0 (DE-625)143652: \|2 rvk
245	1	0	\|a Reinforcement learning and approximate dynamic programming for feedback control \|c edited by Frank L. Lewis (UTA Automation and Robotics Research Institute, Fort Worth, TX), Derong Liu (University of Illinois, Chicago, IL)
264		1	\|a Piscataway, NJ \|b IEEE Press \|c [2013]
264		1	\|a Hoboken, NJ \|b Wiley
264		4	\|c © 2013
300			\|a xxvi, 613 S. \|b Illustrationen, Diagramme \|c 24 cm
336			\|b txt \|2 rdacontent
337			\|b n \|2 rdamedia
338			\|b nc \|2 rdacarrier
490	0		\|a IEEE Press series on computational intelligence
500			\|a Includes bibliographical references and index
520			\|a "Reinforcement learning (RL) and adaptive dynamic programming (ADP) has been one of the most critical research fields in science and engineering for modern complex systems. This book describes the latest RL and ADP techniques for decision and control in human engineered systems, covering both single player decision and control and multi-player games. Edited by the pioneers of RL and ADP research, the book brings together ideas and methods from many fields and provides an important and timely guidance on controlling a wide variety of systems, such as robots, industrial processes, and economic decision-making"--
650		4	\|a Reinforcement learning
650		4	\|a Feedback control systems
650		7	\|a TECHNOLOGY & ENGINEERING / Electronics / General \|2 bisacsh
650	0	7	\|a Bestärkendes Lernen \|g Künstliche Intelligenz \|0 (DE-588)4825546-4 \|2 gnd \|9 rswk-swf
650	0	7	\|a Dynamische Optimierung \|0 (DE-588)4125677-3 \|2 gnd \|9 rswk-swf
689	0	0	\|a Bestärkendes Lernen \|g Künstliche Intelligenz \|0 (DE-588)4825546-4 \|D s
689	0	1	\|a Dynamische Optimierung \|0 (DE-588)4125677-3 \|D s
689	0		\|5 DE-604
700	1		\|a Lewis, Frank L. \|d 1949- \|0 (DE-588)130084867 \|4 edt
776	0	8	\|i Erscheint auch als \|n Online-Ausgabe \|z 978-1-118-45398-8
856	4	2	\|m Digitalisierung UB Bamberg \|q application/pdf \|u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025735246&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA \|3 Inhaltsverzeichnis
999			\|a oai:aleph.bib-bvb.de:BVB01-025735246

Datensatz im Suchindex

_version_	1804150080585859072
adam_text	CONTENTS PREFACE xix CONTRIBUTORS xxiii PART I FEEDBACK CONTROL USING RL AND ADP 1. Reinforcement Learning and Approximate Dynamic Programming (RLADP) — Foundations, Common Misconceptions, and the Challenges Ahead 3 Paul]. Werbos 1.1 Introduction 3 1.2 What is RLADP? 4 1.2.1 Definition of RLADP and the Task it Addresses 4 1.2.2 Basic Tools — Bellman Equation, and Value and Policy Functions 9 1.2.3 Optimization Over Time Without Value Functions 13 1.3 Some Basic Challenges in Implementing ADP 14 1.3.1 Accounting for Unseen Variables 15 1.3.2 Offline Controller Design Versus Real-Time Learning 17 1.3.3 Model-Based Versus Model Free Designs 18 1.3.4 How to Approximate the Value Function Better 19 1.3.5 How to Choose u(t) Based on a Value Function 22 1.3.6 How to Build Cooperative Multiagent Systems with RLADP 25 References 26 2. Stable Adaptive Neural Control of Partially Observable Dynamic Systems 31 J. Nate Knight and Charles W. Anderson 2.1 Introduction 31 2.2 Background 32 2.3 Stability Bias 35 2.4 Example Application 38 2.4.1 The Simulated System 38 2.4.2 An Uncertain Linear Plant Model 40 V¡ CONTENTS 2.4.3 The Closed Loop Control System 41 2.4.4 Determining RNN Weight Updates by Reinforcement Learning 44 2.4.5 Results 46 2.4.6 Conclusions 50 References 50 3. Optimal Control of Unknown Nonlinear Discrete-Time Systems Using the Iterative Globalized Dual Heuristic Programming Algorithm 52 Derong Liu and Ding Wang 3.1 Background Material 53 3.2 Neuro-Optimal Control Scheme Based on the Iterative ADP Algorithm 55 3.2.1 Identification of the Unknown Nonlinear System 55 3.2.2 Derivation of the Iterative ADP Algorithm 59 3.2.3 Convergence Analysis of the Iterative ADP Algorithm 59 3.2.4 Design Procedure of the Iterative ADP Algorithm 64 3.2.5 NN Implementation of the Iterative ADP Algorithm Using GDHP Technique M 3.3 Generalization 67 3.4 Simulation Studies 68 3.5 Summary 74 References 74 4. Learning and Optimization in Hierarchical Adaptive Critic Design 78 Haibo He, Zhen Ni, and Dongbin Zhao 4.1 Introduction 78 4.2 Hierarchical ADP Architecture with Multiple-Goal Representation 80 4.2.1 System Level Structure 80 4.2.2 Architecture Design and Implementation 81 4.2.3 Learning and Adaptation in Hierarchical ADP 83 4.3 Case Study: The Ball-and-Beam System 87 4.3.1 Problem Formulation 88 4.3.2 Experiment Configuration and Parameters Setup 89 4.3.3 Simulation Results and Analysis ^ 4.4 Conclusions and Future Work 94 References 95 5. Single Network Adaptive Critics Networks — Development, Analysis, and Applications ^ Jie Ding, Ali Heydari, and S.N. Balakrishnan 5.1 Introduction 98 5.2 Approximate Dynamic Programing Ю0 CONTENTS VU 5.3 SNĄC 102 5.3.1 State Generation for Neural Network Training 103 5.3.2 Neural Network Training 103 5.3.3 Convergence Condition 104 5.4 J-SNAC 104 5.4.1 Neural Network Training 105 5.4.2 Numerical Analysis 105 5.5 Finite-SNAC 108 5.5.1 Neural Network Training 109 5.5.2 Convergence Theorems 111 5.5.3 Numerical Analysis 112 5.6 Conclusions 116 References 116 Linearly Solvable Optimal Control 119 K. Dvijotham and E. Todorov 6.1 Introduction 119 6.1.1 Notation 121 6.1.2 Markov Decision Processes 122 6.2 Linearly Solvable Optimal Control Problems 123 6.2.1 Probability Shift: An Alternate View of Control 123 6.2.2 Linearly Solvable Markov Decision Processes (LMDPs) 124 6.2.3 An Alternate View of LMDPs 124 6.2.4 Other Problem Formulations 126 6.2.5 Applications 126 6.2.6 Linearly Solvable Controlled Diffusions (LDs) 127 6.2.7 Relationship Between Discrete and Continuous-Time Problems 128 6.2.8 Historical Perspective 129 6.3 Extension to Risk-Sensitive Control and Game Theory 130 6.3.1 Game Theoretic Control: Competitive Games 130 6.3.2 Rényi Divergence 130 6.3.3 Linearly Solvable Markov Games 130 6.3.4 Linearly Solvable Differential Games 133 6.3.5 Relationships Among the Different Formulations 134 6.4 Properties and Algorithms 134 6.4.1 Sampling Approximations and Path-Integral Control 134 6.4.2 Residual Minimization via Function Approximation 135 6.4.3 Natural Policy Gradient 136 6.4.4 Compositionality of Optimal Control Laws 136 6.4.5 Stochastic Maximum Principle 137 6.4.6 Inverse Optimal Control 138 6.5 Conclusions and Future Work 139 References 139 Vili CONTENTS 7. Approximating Optimal Control with Value Gradient Learning 142 Michael Fairbank, Danii Prokhorov, and Eduardo Alonso 7.1 Introduction 142 7.2 Value Gradient Learning and BPTT Algorithms 144 7.2.1 Preliminary Definitions 144 7.2.2 VGL(X) Algorithm 145 7.2.3 BPTT Algorithm 147 7.3 A Convergence Proof for VGL(l) for Control with Function Approximation 148 7.3.1 Using a Greedy Policy with a Critic Function 149 7.3.2 The Equivalence ofVGL(l) to BPTT 151 7.3.3 Convergence Conditions 152 7.3.4 Notes on the Ω, Matrix 152 7.4 Vertical Lander Experiment 154 7.4.1 Problem Definition 154 7.4.2 Efficient Evaluation of the Greedy Policy 155 7.4.3 Observations on the Purpose of Ω, 157 7.4.4 Experimental Results for Vertical Lander Problem 158 7.5 Conclusions 159 References 160 8. A Constrained Backpropagation Approach to Function Approximation and Approximate Dynamic Programming 162 Silvia Ferrari, Keith R udd, and Gianluca Di Muro 8.1 Background 163 8.2 Constrained Backpropagation (CPROP) Approach 163 8.2.1 Neural Network Architecture and Procedural Memories 165 8.2.2 Derivation of LTM Equality Constraints and Adjoined Error Gradient 165 8.2.3 Example: Incremental Function Approximation 168 8.3 Solution of Partial Differential Equations in Nonstationary Environments 170 8.3.1 CPROP Solution of Boundary Value Problems 170 8.3.2 Example: PDE Solution on a Unit Circle 171 8.3.3 CPROP Solution to Parabolic PDEs 174 8.4 Preserving Prior Knowledge in Exploratory Adaptive Critic Designs 174 8.4.1 Derivation of LTM Constraints for Feedback Control 175 8.4.2 Constrained Adaptive Critic Design 177 8.5 Summary 179 Appendix: Algebraic ANN Control Matrices 180 References 180 CONTENTS ІХ 9. Toward Design of Nonlinear ADP Learning Controllers with Performance Assurance 182 Jennie Si, Lei Yang, Chao Lu, Kostas S. Tsakalis, and Armando A. Rodríguez 9.1 Introduction 183 9.2 Direct Heuristic Dynamic Programming 184 9.3 A Control Theoretic View on the Direct HDP 186 9.3.1 Problem Setup 187 9.3.2 Frequency Domain Analysis of Direct HDP 189 9.3.3 Insight from Comparing Direct HDP to LQR 192 9.4 Direct HDP Design with Improved Performance Case 1— Design Guided by a Priori LQR Information 193 9.4.1 Direct HDP Design Guided by a Priori LQR Information 193 9.4.2 Performance of the Direct HDP Beyond Linearization 195 9.5 Direct HDP Design with Improved Performance Case 2— Direct HDP for Coorindated Damping Control of Low-Frequency Oscillation 198 9.6 Summary 201 References 202 10. Reinforcement Learning Control with Time-Dependent Agent Dynamics 203 Kenton Kirkpatrick and John Valašek 10.1 Introduction 203 10.2 Q-Leaming 205 10.2.1 Q-Learning Algorithm 205 10.2.2 ε -Greedy 207 10.2.3 Function Approximation 208 10.3 Sampled Data Q-Learning 209 10.3.1 Sampled Data Q-Learning Algorithm 209 10.3.2 Example 210 10.4 System Dynamics Approximation 213 10.4.1 First-Order Dynamics Learning 214 10.4.2 Multiagent System Thought Experiment 216 10.5 Closing Remarks 218 References 219 11. Online Optimal Control of Nonaffine Nonlinear Discrete-Time Systems without Using Value and Policy Iterations 221 Hassan Zargarzadeh, Qinmin Yang, and S. Jagannathan 11.1 Introduction 221 11.2 Background 224 11.3 Reinforcement Learning Based Control 225 11.3.1 Affine-Like Dynamics 225 11.3.2 Online Reinforcement Learning Controller Design 229 X CONTENTS 11.3.3 The Action NN Design 229 11.3.4 The Critic NN Design 230 11.3.5 Weight Updating Laws for the NNs 231 11.3.6 Main Theoretic Results 232 11.4 Time-Based Adaptive Dynamic Programming-Based Optimal Control 234 11.4.1 Online NN-Based Identifier 235 11.4.2 Neural Network-Based Optimal Controller Design 237 11.4.3 Cost Function Approximation for Optimal Regulator Design 238 11.4.4 Estimation of the Optimal Feedback Control Signal 240 11.4.5 Convergence Proof 242 11.4.6 Robustness 244 11.5 Simulation Result 247 11.5.1 Reinforcement-Learning-Based Control of a Nonlinear System 247 11.5.2 The Drawback of HDP Policy Iteration Approach 250 11.5.3 OLA-Based Optimal Control Applied to HCCI Engine 251 References 255 12. An Actor-Critic-Identifier Architecture for Adaptive Approximate Optimal Control 258 S. Bhasin, R. Kamalapurkar, M. Johnson, K.G. Vamvoudakis, F.L Lewis, and W.E. Dixon 12.1 Introduction 259 12.2 Actor-Critic-Identifier Architecture for ШВ Approximation 260 12.3 Actor-Critic Design 263 12.4 Identifier Design 264 12.5 Convergence and Stability Analysis 270 12.6 Simulation 274 12.7 Conclusion 275 References 278 13. Robust Adaptive Dynamic Programming 281 Yu Jiang and Zhong-Ping Jiang 13.1 Introduction 281 13.2 Optimality Versus Robustness 283 13.2.1 Systems with Matched Disturbance Input 283 13.2.2 Adding One Integrator 284 13.2.3 Systems in Lower-Triangular Form 286 13.3 Robust-ADP Design for Disturbance Attenuation 288 13.3.1 Horizontal Learning 288 13.3.2 Vertical Learning 290 13.3.3 Robust-ADP Algorithm for Disturbance Attenuation 291 13.4 Robust-ADP for Partial-State Feedback Control 292 CONTENTS XI 13.4.1 The ISS Property 293 13.4.2 Online Learning Strategy 295 13.5 Applications 296 13.5.1 Load-Frequency Control for a Power System 296 13.5.2 Machine Tool Power Drive System 298 13.6 Summary 300 References 301 PART II LEARNING AND CONTROL IN MULTIAGENT GAMES 14. Hybrid Learning in Stochastic Games and Its Application in Network Security 305 Quanyan Zhu, Hamidou Tembine, and Tamer Basar 14.1 Introduction 305 14.1.1 Related Work 306 14.1.2 Contribution 307 14.1.3 Organization of the Chapter 308 14.2 Two-Person Game 308 14.3 Learning in NZSGs 310 14.3.1 Learning Procedures 310 14.3.2 Learning Schemes 311 14.4 Main Results 314 14.4.1 Stochastic Approximation of the Pure Learning Schemes 314 14.4.2 Stochastic Approximation of the Hybrid Learning Scheme 315 14.4.3 Connection with Equilibria of the Expected Game 317 14.5 Security Application 322 14.6 Conclusions and Future Works 326 Appendix: Assumptions for Stochastic Approximation 327 References 328 15. Integral Reinforcement Learning for Online Computation of Nash Strategies of Nonzero-Sum Differential Games 330 Dragúna Vrabie and F.L. Lewis 15.1 Introduction 331 15.2 Two-Player Games and Integral Reinforcement Learning 333 15.2.1 Two-Player Nonzero-Sum Games and Nash Equilibrium 333 15.2.2 Integral Reinforcement Learning for Two-Player Nonzero-Sum Games 335 15.3 Continuous-Time Value Iteration to Solve the Riccati Equation 337 15.4 Online Algorithm to Solve Nonzero-Sum Games 339 XU CONTENTS 15.4.1 Finding Stabilizing Gains to Initialize the Online Algorithm 339 15.4.2 Online Partially Model-Free Algorithm for Solving the Nonzero-Sum Differential Game 339 15.4.3 Adaptive Critic Structure for Solving the Two-Player Nash Differential Game 340 15.5 Analysis of the Online Learning Algorithm for NZS Games 342 15.5.1 Mathematical Formulation of the Online Algorithm 342 15.6 Simulation Result for the Online Game Algorithm 345 15.7 Conclusion 347 References 348 16. Online Learning Algorithms for Optimal Control and Dynamic Games 350 Kyriakos G. Vamvoudakis and Frank L. Lewis 16.1 Introduction 350 16.2 Optimal Control and the Continuous Time Hamilton-Jacobi-Bellman Equation 352 16.2.1 Optimal Control and Hamilton-Jacobi-Bellman Equation 352 16.2.2 Policy Iteration for Optimal Control 354 16.2.3 Online Synchronous Policy Iteration 355 16.2.4 Simulation 357 16.3 Online Solution of Nonlinear Two-Player Zero-Sum Games and Hamilton-Jacobi-Isaacs Equation 360 16.3.1 Zero-Sum Games and Hamilton-Jacobi-Isaacs Equation 360 16.3.2 Policy Iteration for Two-Player Zero-Sum Differential Games 361 16.3.3 Online Solution for Two-Player Zero-Sum Differential Games 362 16.3.4 Simulation 364 16.4 Online Solution of Nonlinear Nonzero-Sum Games and Coupled Hamilton-Jacobi Equations 366 16.4.1 Nonzero Sum Games and Coupled Hamilton-Jacobi-Equations 367 16.4.2 Policy Iteration for Nonzero Sum Differential Games 369 16.4.3 Online Solution for Two-Player Nonzero Sum Differential Games 370 16.4.4 Simulation 372 References 376 CONTENTS ХІІІ PART ΠΙ FOUNDATIONS IN MDP AND RL 17. Lambda-Policy Iteration: A Review and a New Implementation 381 Dimitri P. Bertsekas 17.1 Introduction 381 17.2 Lambda-Policy Iteration without Cost Function Approximation 386 17.3 Approximate Policy Evaluation Using Projected Equations 388 17.3.1 Exploration-Contraction Trade-off 389 17.3.2 Bias 390 17.3.3 Bias-Variance Trade-off 390 17.3.4 TD Methods 391 17.3.5 Comparison of LSTD(X) and LSPE(X) 394 17.4 Lambda-Policy Iteration with Cost Function Approximation 395 17.4.1 The LSPE(À) Implementation 396 17.4.2 λ-ΡΙ(Ο) — An Implementation Based on a Discounted MDP 397 17.4.3 λ-ΡΙ(1) — An Implementation Based on a Stopping Problem 398 17.4.4 Comparison with Alternative Approximate PI Methods 404 17.4.5 Exploration-Enhanced LSTD(À) with Geometric Sampling 404 17.5 Conclusions 406 References 406 18. Optimal Learning and Approximate Dynamic Programming 410 Warren B. Powell and llya O. Ryzhov 18.1 Introduction 410 18.2 Modeling 411 18.3 The Four Classes of Policies 412 18.3.1 Myopic Cost Function Approximation 412 18.3.2 Lookahead Policies 413 18.3.3 Policy Function Approximation 414 18.3.4 Policies Based on Value Function Approximations 414 18.3.5 Learning Policies 415 18.4 Basic Learning Policies for Policy Search 416 18.4.1 The Belief Model 417 18.4.2 Objective Functions for Offline and Online Learning 418 18.4.3 Some Heuristic Policies 419 18.5 Optimal Learning Policies for Policy Search 421 18.5.1 The Knowledge Gradient for Offline Learning 421 18.5.2 The Knowledge Gradient for Correlated Beliefs 423 18.5.3 The Knowledge Gradient for Online Learning 425 XiV CONTENTS 18.5.4 The Knowledge Gradient for a Parametric Belief Model 425 18.5.5 Discussion 426 18.6 Learning with a Physical State 427 18.6.1 Heuristic Policies 428 18.6.2 The Knowledge Gradient with a Physical State 428 References 429 19. An Introduction to Event-Based Optimization: Theory and Applications 432 Xi-Ren Cao, Yanjia Zhao, Qing-Shan Jia, and Qianchuan Zhao 19.1 Introduction 432 19.2 Literature Review 433 19.3 Problem Formulation 434 19.4 Policy Iteration for EBO 435 19.4.1 Performance Difference and Derivative Formulas 435 19.4.2 Policy Iteration for EBO 440 19.5 Example: Material Handling Problem 441 19.5.1 Problem Formulation 441 19.5.2 Event-Based Optimization for the Material Handling Problem 444 19.5.3 Numerical Results 446 19.6 Conclusions 448 References 449 20. Bounds for Markov Decision Processes 452 Vijay V. Desai, Vivek F. Farias, and Ciamac C. Moallemi 20.1 Introduction 452 20.1.1 Related Literature 454 20.2 Problem Formulation 455 20.3 The Linear Programming Approach 456 20.3.1 The Exact Linear Program 456 20.3.2 Cost-to-Go Function Approximation 457 20.3.3 The Approximate Linear Program 457 20.4 The Martingale Duality Approach 458 20.5 The Pathwise Optimization Method 461 20.6 Applications 463 20.6.1 Optimal Stopping 464 20.6.2 Linear Convex Control 467 20.7 Conclusion 470 References 471 CONTENTS XV 21. Approximate Dynamic Programming and Backpropagation on Timescales 474 John Seiffertt and Donald Wunsch 21.1 Introduction: Timescales Fundamentals 474 21.1.1 Single-Variable Calculus 475 21.1.2 Calculus of Multiple Variables 476 21.1.3 Extension of the Chain Rule 477 21.1.4 Induction on Timescales 479 21.2 Dynamic Programming 479 21.2.1 Dynamic Programming Overview 480 21.2.2 Dynamic Programming Algorithm on Timescales 481 21.2.3 HJB Equation on Timescales 483 21.3 Backpropagation 485 21.3.1 Ordered Derivatives 486 21.3.2 The Backpropagation Algorithm on Timescales 490 21.4 Conclusions 492 References 492 22. A Survey of Optimistic Planning in Markov Decision Processes 494 Lucian Buşoniu, Rémi Munos, and Robert Babuška 22.1 Introduction 494 22.2 Optimistic Online Optimization 497 22.2.1 Bandit Problems 497 22.2.2 Lipschitz Functions and Deterministic Samples 498 22.2.3 Lipschitz Functions and Random Samples 499 22.3 Optimistic Planning Algorithms 500 22.3.1 Optimistic Planning for Deterministic Systems 502 22.3.2 Open-Loop Optimistic Planning 504 22.3.3 Optimistic Planning for Sparsely Stochastic Systems 505 22.3.4 Theoretical Guarantees 509 22.4 Related Planning Algorithms 509 22.5 Numerical Example 510 References 515 23. Adaptive Feature Pursuit: Online Adaptation of Features in Reinforcement Learning 517 Shalabh Bhatnagar, Vivek S. Borkar, and LA. Prashanth 23.1 Introduction 517 23.2 The Framework 520 23.2.1 The TD(0) Learning Algorithm 521 23.3 The Feature Adaptation Scheme 522 23.3.1 The Feature Adaptation Scheme 522 23.4 Convergence Analysis 525 23.5 Application to Traffic Signal Control 527 XVi CONTENTS 23.6 Conclusions 532 References 533 24. Feature Selection for Neuro-Dynamic Programming 535 Dayu Huang, W. Chen, P. Mehta, S. Meyn, and A. Surana 24.1 Introduction 535 24.2 Optimality Equations 536 24.2.1 Deterministic Model 537 24.2.2 Diffusion Model 538 24.2.3 Models in Discrete Time 539 24.2.4 Approximations 539 24.3 Neuro-Dynamic Algorithms 542 24.3.1 MDP Model 542 24.3.2 TD-Learning 543 24.3.3 SARSA 546 24.3.4 Q-Learning 547 24.3.5 Architecture 550 24.4 Fluid Models 551 24.4.1 The CRW Queue 551 24.4.2 Speed-Scaling Model 552 24.5 Diffusion Models 554 24.5.1 The CRW Queue 555 24.5.2 Speed-Scaling Model 556 24.6 Mean Field Games 556 24.7 Conclusions 557 References 558 25. Approximate Dynamic Programming for Optimizing Oil Production 560 Zheng Wen, Louis J. Durlofsky, Benjamin Van Roy, and Khalid Aziz 25.1 Introduction 560 25.2 Petroleum Reservoir Production Optimization Problem 562 25.3 Review of Dynamic Programming and Approximate Dynamic Programming 564 25.4 Approximate Dynamic Programming Algorithm for Reservoir Production Optimization 566 25.4.1 Basis Function Construction 566 25.4.2 Computation of Coefficients 568 25.4.3 Solving Subproblems 570 25.4.4 Adaptive Basis Function Selection and Bootstrapping 571 25.4.5 Computational Requirements 572 25.5 Simulation Results 573 25.6 Concluding Remarks 578 References 580 CONTENTS XVII 26. A Learning Strategy for Source Tracking in Unstructured Environments 582 Titus Appel, Rafael Fierro, Brandon Rohrer, Ron Lumia, and John Wood 26.1 Introduction 582 26.2 Reinforcement Learning 583 26.2.1 Q-Learning 584 26.2.2 Q-Learning and Robotics 589 26.3 Light-Following Robot 589 26.4 Simulation Results 592 26.5 Experimental Results 595 26.5.1 Hardware 596 26.5.2 Problems in Hardware Implementation 597 26.5.3 Results 598 26.6 Conclusions and Future Work 599 References 599 INDEX 601
any_adam_object	1
author2	Lewis, Frank L. 1949-
author2_role	edt
author2_variant	f l l fl fll
author_GND	(DE-588)130084867
author_facet	Lewis, Frank L. 1949-
building	Verbundindex
bvnumber	BV040755567
classification_rvk	ST 302
ctrlnum	(OCoLC)835297866 (DE-599)BVBBV040755567
discipline	Informatik
format	Book
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02649nam a2200457 c 4500</leader><controlfield tag="001">BV040755567</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20200303 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">130218s2013 a\|\|\| \|\|\|\| 00\|\|\| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781118104200</subfield><subfield code="c">hbk</subfield><subfield code="9">978-1-118-10420-0</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">111810420X</subfield><subfield code="9">1-118-10420-X</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)835297866</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV040755567</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-706</subfield><subfield code="a">DE-473</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 302</subfield><subfield code="0">(DE-625)143652:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Reinforcement learning and approximate dynamic programming for feedback control</subfield><subfield code="c">edited by Frank L. Lewis (UTA Automation and Robotics Research Institute, Fort Worth, TX), Derong Liu (University of Illinois, Chicago, IL)</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Piscataway, NJ</subfield><subfield code="b">IEEE Press</subfield><subfield code="c">[2013]</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Hoboken, NJ</subfield><subfield code="b">Wiley</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">© 2013</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xxvi, 613 S.</subfield><subfield code="b">Illustrationen, Diagramme</subfield><subfield code="c">24 cm</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">IEEE Press series on computational intelligence</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Includes bibliographical references and index</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">"Reinforcement learning (RL) and adaptive dynamic programming (ADP) has been one of the most critical research fields in science and engineering for modern complex systems. This book describes the latest RL and ADP techniques for decision and control in human engineered systems, covering both single player decision and control and multi-player games. Edited by the pioneers of RL and ADP research, the book brings together ideas and methods from many fields and provides an important and timely guidance on controlling a wide variety of systems, such as robots, industrial processes, and economic decision-making"--</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Reinforcement learning</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Feedback control systems</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">TECHNOLOGY & ENGINEERING / Electronics / General</subfield><subfield code="2">bisacsh</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Bestärkendes Lernen</subfield><subfield code="g">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4825546-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Dynamische Optimierung</subfield><subfield code="0">(DE-588)4125677-3</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Bestärkendes Lernen</subfield><subfield code="g">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4825546-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Dynamische Optimierung</subfield><subfield code="0">(DE-588)4125677-3</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Lewis, Frank L.</subfield><subfield code="d">1949-</subfield><subfield code="0">(DE-588)130084867</subfield><subfield code="4">edt</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">978-1-118-45398-8</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Bamberg</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025735246&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-025735246</subfield></datafield></record></collection>
id	DE-604.BV040755567
illustrated	Illustrated
indexdate	2024-07-10T00:33:14Z
institution	BVB
isbn	9781118104200 111810420X
language	English
oai_aleph_id	oai:aleph.bib-bvb.de:BVB01-025735246
oclc_num	835297866
open_access_boolean
owner	DE-706 DE-473 DE-BY-UBG
owner_facet	DE-706 DE-473 DE-BY-UBG
physical	xxvi, 613 S. Illustrationen, Diagramme 24 cm
publishDate	2013
publishDateSearch	2013
publishDateSort	2013
publisher	IEEE Press Wiley
record_format	marc
series2	IEEE Press series on computational intelligence
spelling	Reinforcement learning and approximate dynamic programming for feedback control edited by Frank L. Lewis (UTA Automation and Robotics Research Institute, Fort Worth, TX), Derong Liu (University of Illinois, Chicago, IL) Piscataway, NJ IEEE Press [2013] Hoboken, NJ Wiley © 2013 xxvi, 613 S. Illustrationen, Diagramme 24 cm txt rdacontent n rdamedia nc rdacarrier IEEE Press series on computational intelligence Includes bibliographical references and index "Reinforcement learning (RL) and adaptive dynamic programming (ADP) has been one of the most critical research fields in science and engineering for modern complex systems. This book describes the latest RL and ADP techniques for decision and control in human engineered systems, covering both single player decision and control and multi-player games. Edited by the pioneers of RL and ADP research, the book brings together ideas and methods from many fields and provides an important and timely guidance on controlling a wide variety of systems, such as robots, industrial processes, and economic decision-making"-- Reinforcement learning Feedback control systems TECHNOLOGY & ENGINEERING / Electronics / General bisacsh Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 gnd rswk-swf Dynamische Optimierung (DE-588)4125677-3 gnd rswk-swf Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 s Dynamische Optimierung (DE-588)4125677-3 s DE-604 Lewis, Frank L. 1949- (DE-588)130084867 edt Erscheint auch als Online-Ausgabe 978-1-118-45398-8 Digitalisierung UB Bamberg application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025735246&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis
spellingShingle	Reinforcement learning and approximate dynamic programming for feedback control Reinforcement learning Feedback control systems TECHNOLOGY & ENGINEERING / Electronics / General bisacsh Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 gnd Dynamische Optimierung (DE-588)4125677-3 gnd
subject_GND	(DE-588)4825546-4 (DE-588)4125677-3
title	Reinforcement learning and approximate dynamic programming for feedback control
title_auth	Reinforcement learning and approximate dynamic programming for feedback control
title_exact_search	Reinforcement learning and approximate dynamic programming for feedback control
title_full	Reinforcement learning and approximate dynamic programming for feedback control edited by Frank L. Lewis (UTA Automation and Robotics Research Institute, Fort Worth, TX), Derong Liu (University of Illinois, Chicago, IL)
title_fullStr	Reinforcement learning and approximate dynamic programming for feedback control edited by Frank L. Lewis (UTA Automation and Robotics Research Institute, Fort Worth, TX), Derong Liu (University of Illinois, Chicago, IL)
title_full_unstemmed	Reinforcement learning and approximate dynamic programming for feedback control edited by Frank L. Lewis (UTA Automation and Robotics Research Institute, Fort Worth, TX), Derong Liu (University of Illinois, Chicago, IL)
title_short	Reinforcement learning and approximate dynamic programming for feedback control
title_sort	reinforcement learning and approximate dynamic programming for feedback control
topic	Reinforcement learning Feedback control systems TECHNOLOGY & ENGINEERING / Electronics / General bisacsh Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 gnd Dynamische Optimierung (DE-588)4125677-3 gnd
topic_facet	Reinforcement learning Feedback control systems TECHNOLOGY & ENGINEERING / Electronics / General Bestärkendes Lernen Künstliche Intelligenz Dynamische Optimierung
url	http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025735246&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA
work_keys_str_mv	AT lewisfrankl reinforcementlearningandapproximatedynamicprogrammingforfeedbackcontrol

Verfügbarkeit

Es ist kein Print-Exemplar vorhanden.

Fernleihe Bestellen Achtung: Nicht im THWS-Bestand! Inhaltsverzeichnis

MARC

Datensatz im Suchindex

Es ist kein Print-Exemplar vorhanden.

Ähnliche Einträge