Reinforcement learning and optimal control:
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Belmont, Massachusetts
Athena Scientific
[2019]
|
Schriftenreihe: | Athena scientific optimization and computation series
1 |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | Literaturverzeichnis: Seite 345-367 |
Beschreibung: | xiv, 373 Seiten Diagramme |
ISBN: | 9781886529397 |
Internformat
MARC
LEADER | 00000nam a2200000 cb4500 | ||
---|---|---|---|
001 | BV046202951 | ||
003 | DE-604 | ||
005 | 20240312 | ||
007 | t| | ||
008 | 191017s2019 xxu|||| |||| 00||| eng d | ||
020 | |a 9781886529397 |c hbk |9 978-1-886529-39-7 | ||
035 | |a (OCoLC)1126402004 | ||
035 | |a (DE-599)KXP1671468651 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
044 | |a xxu |c XD-US | ||
049 | |a DE-384 |a DE-83 |a DE-91 |a DE-739 |a DE-706 |a DE-29T |a DE-703 |a DE-20 |a DE-945 |a DE-1050 |a DE-898 |a DE-523 | ||
084 | |a QH 740 |0 (DE-625)141614: |2 rvk | ||
084 | |a ST 300 |0 (DE-625)143650: |2 rvk | ||
084 | |a DAT 708f |2 stub | ||
100 | 1 | |a Bertsekas, Dimitri P. |d 1942- |e Verfasser |0 (DE-588)171165519 |4 aut | |
245 | 1 | 0 | |a Reinforcement learning and optimal control |c by Dimitri P. Bertsekas, Massachusetts Institute of Technology |
264 | 1 | |a Belmont, Massachusetts |b Athena Scientific |c [2019] | |
300 | |a xiv, 373 Seiten |b Diagramme | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 1 | |a Athena scientific optimization and computation series |v 1 | |
500 | |a Literaturverzeichnis: Seite 345-367 | ||
650 | 0 | 7 | |a Künstliche Intelligenz |0 (DE-588)4033447-8 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Optimale Kontrolle |0 (DE-588)4121428-6 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Bestärkendes Lernen |g Künstliche Intelligenz |0 (DE-588)4825546-4 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Künstliche Intelligenz |0 (DE-588)4033447-8 |D s |
689 | 0 | 1 | |a Optimale Kontrolle |0 (DE-588)4121428-6 |D s |
689 | 0 | 2 | |a Bestärkendes Lernen |g Künstliche Intelligenz |0 (DE-588)4825546-4 |D s |
689 | 0 | |5 DE-604 | |
689 | 1 | 0 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |D s |
689 | 1 | |5 DE-604 | |
830 | 0 | |a Athena scientific optimization and computation series |v 1 |w (DE-604)BV015264203 |9 1 | |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=031582049&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-031582049 |
Datensatz im Suchindex
_version_ | 1821138788839063552 |
---|---|
adam_text |
Contents 1. Exact Dynamic Programming 1.1. Deterministic Dynamic Programming. p. 2 1.1.1. Deterministic Problems .p. 2 1.1.2. The Dynamic Programming Algorithm.p. 7 1.1.3. Approximation in Value Space. p. 12 1.2. Stochastic Dynamic Programming.p. 14 1.3. Examples, Variations, and Simplifications.p. 18 1.3.1. Deterministic Shortest Path Problems.p. 19 1.3.2. Discrete Deterministic Optimization.p. 21 1.3.3. Problems with a Termination State.p. 25 1.3.4. Forecasts. p. 26 1.3.5. Problems with Uncontrollable State Components . . . p. 29 1.3.6. Partial State Information and Belief States. p. 34 1.3.7. Linear Quadratic Optimal Control. p. 38 1.3.8. Systems with Unknown Parameters - Adaptive. Control .p. 40 1.4. Reinforcement Learning and Optimal Control - Some. Terminology. p. 43 1.5. Notes and Sources .p. 45 2. Approximation in Value Space 2.1. Approximation Approaches in Reinforcement Learning . . . . p. 50
2.1.1. General Issues of Approximation in Value Space . . . . p. 54 2.1.2. Off-Line and On-Line Methods. p. 56 2.1.3. Model-Based Simplification of the Lookahead . Minimization. p. 57 2.1.4. Model-Free Q-Factor Approximation in Value Space . . p. 58 2.1.5. Approximation in Policy Space on Top of . Approximation in Value Space . p.61 2.1.6. When is Approximation in Value Space Effective? . . . p. 62 2.2. Multistep Lookahead. p. 64
vi 2.3. 2.4. 2.5. 2.6. Contents 2.2.1. Multistep Lookahead and Rolling Horizon. p. 65 2.2.2. Multistep Lookahead and Deterministic Problems . . . p. 67 Problem Approximation. p. 69 2.3.1. Enforced Decomposition.p. 69 2.3.2. Probabilistic Approximation - Certainty Equivalent . Control . p.76 Rollout and the Policy Improvement Principle. p. 83 2.4.1. On-Line Rollout for Deterministic Discrete. Optimization. p. 84 2.4.2. Stochastic Rollout and Monte Carlo Tree Search . . . p. 95 2.4.3. Rollout with an Expert .p. 104 On-Line Rollout for Deterministic Infinite-Spaces Problems - . . . Optimization Heuristics. p. 106 2.5.1. Model Predictive Control.p. 108 2.5.2. Target Tubes and the Constrained Controllability. Condition . p. 115 2.5.3. Variants of Model Predictive Control . p. 118 Notes and Sources .p. 120 3. Parametric Approximation 3.1. Approximation Architectures. p. 126 3.1.1.
Linear and Nonlinear Feature-Based Architectures . . p. 126 3.1.2. Training of Linear and Nonlinear Architectures . . . p. 134 3.1.3. Incremental Gradient and Newton Methods.p. 135 3.2. Neural Networks. p. 149 3.2.1. Training of Neural Networks.p. 153 3.2.2. Multilayer and Deep Neural Networks. p. 157 3.3. Sequential Dynamic Programming Approximation.p. 161 3.4. Q-Factor Parametric Approximation.p. 162 3.5. Parametric Approximation in Policy Space by. Classification. p. 165 3.6. Notes and Sources .p. 171 4. Infinite Horizon Dynamic Programming 4.1. 4.2. 4.3. 4.4. 4.5. 4.6. An Overview of Infinite Horizon Problems. p. 174 Stochastic Shortest Path Problems.p. 177 Discounted Problems.p. 187 Semi-Markov Discounted Problems. p. 192 Asynchronous Distributed Value Iteration. p. 197 Policy Iteration . p. 200 4.6.1. Exact Policy Iteration. p. 200 4.6.2. Optimistic
and Multistep Lookahead Policy . Iteration. p. 205 4.6.3. Policy Iteration for Q-factors.p. 208
Contents vii 4.7. Notes and Sources . p. 209 4.8. Appendix: Mathematical Analysis . p. 211 4.8.1. Proofs for Stochastic Shortest Path Problems . p. 212 4.8.2. Proofs for Discounted Problems. p. 217 4.8.3. Convergence of Exact and Optimistic. Policy Iteration . p. 218 5. Infinite Horizon Reinforcement Learning 5.1. Approximation in Value Space - Performance Bounds . . . p. 222 5.1.1. Limited Lookahead. p. 224 5.1.2. Rollout and Approximate Policy Improvement . . . p. 227 5.1.3. Approximate Policy Iteration. p. 232 5.2. Fitted Value Iteration. p. 235 5.3. Simulation-Based Policy Iteration with Parametric. Approximation. p. 239 5.3.1. Self-Learning and Actor-Critic Methods.p.239 5.3.2. Model-Based Variant of a Critic-Only Method . . . p. 241 5.3.3. Mo del-Free Variant of a Critic-Only Method . p. 243 5.3.4. Implementation Issues of Parametric Policy . Iteration. p. 246 5.3.5. Convergence Issues of Parametric Policy Iteration - . Oscillations.p. 249 5.4. Q-Learning
. p. 253 5.4.1. Optimistic Policy Iteration with Parametric Q-Factor . . . Approximation - SARSA and DQN.p. 255 5.5. Additional Methods - Temporal Differences .p. 256 5.6. Exact and Approximate Linear Programming . p. 267 5.7. Approximation in Policy Space.p. 270 5.7.1. Training by Cost Optimization - Policy Gradient,. Cross-Entropy, and Random Search Methods . p. 276 5.7.2. Expert-Based Supervised Learning .p. 286 5.7.3. Approximate Policy Iteration, Rollout, and . Approximation in Policy Space. p. 288 5.8. Notes and Sources . p. 293 5.9. Appendix: Mathematical Analysis . p. 298 5.9.1. Performance Bounds for Multistep Lookahead. p. 299 5.9.2. Performance Bounds for Rollout. p. 301 5.9.3. Performance Bounds for Approximate Policy. Iteration.p. 304 6. Aggregation 6.1. Aggregation with Representative States. p. 308 6.1.1. Continuous State and Control Space Discretization . p. 314 6.1.2. Continuous State Space - POMDP Discretization . . p. 315
viii Contents 6.2. Aggregation with Representative Features. p. 6.2.1. Hard Aggregation and Error Bounds. p. 6.2.2. Aggregation Using Features.p. 6.3. Methods for Solving the Aggregate Problem. p. 6.3.1. Simulation-Based Policy Iteration. p. 6.3.2. Simulation-Based Value Iteration and Q-Learning . . p. 6.4. Feature-Based Aggregation with a Neural Network . p. 6.5. Biased Aggregation. p. 6.6. Notes and Sources .p. 6.7. Appendix: Mathematical Analysis . p. 317 320 322 328 328 331 332 334 337 340 References.p. 345 Index. p. 369 |
any_adam_object | 1 |
author | Bertsekas, Dimitri P. 1942- |
author_GND | (DE-588)171165519 |
author_facet | Bertsekas, Dimitri P. 1942- |
author_role | aut |
author_sort | Bertsekas, Dimitri P. 1942- |
author_variant | d p b dp dpb |
building | Verbundindex |
bvnumber | BV046202951 |
classification_rvk | QH 740 ST 300 |
classification_tum | DAT 708f |
ctrlnum | (OCoLC)1126402004 (DE-599)KXP1671468651 |
discipline | Informatik Wirtschaftswissenschaften |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 cb4500</leader><controlfield tag="001">BV046202951</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20240312</controlfield><controlfield tag="007">t|</controlfield><controlfield tag="008">191017s2019 xxu|||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781886529397</subfield><subfield code="c">hbk</subfield><subfield code="9">978-1-886529-39-7</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1126402004</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)KXP1671468651</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxu</subfield><subfield code="c">XD-US</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-384</subfield><subfield code="a">DE-83</subfield><subfield code="a">DE-91</subfield><subfield code="a">DE-739</subfield><subfield code="a">DE-706</subfield><subfield code="a">DE-29T</subfield><subfield code="a">DE-703</subfield><subfield code="a">DE-20</subfield><subfield code="a">DE-945</subfield><subfield code="a">DE-1050</subfield><subfield code="a">DE-898</subfield><subfield code="a">DE-523</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">QH 740</subfield><subfield code="0">(DE-625)141614:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 300</subfield><subfield code="0">(DE-625)143650:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DAT 708f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Bertsekas, Dimitri P.</subfield><subfield code="d">1942-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)171165519</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Reinforcement learning and optimal control</subfield><subfield code="c">by Dimitri P. Bertsekas, Massachusetts Institute of Technology</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Belmont, Massachusetts</subfield><subfield code="b">Athena Scientific</subfield><subfield code="c">[2019]</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xiv, 373 Seiten</subfield><subfield code="b">Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Athena scientific optimization and computation series</subfield><subfield code="v">1</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Literaturverzeichnis: Seite 345-367</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4033447-8</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Optimale Kontrolle</subfield><subfield code="0">(DE-588)4121428-6</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Bestärkendes Lernen</subfield><subfield code="g">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4825546-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4033447-8</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Optimale Kontrolle</subfield><subfield code="0">(DE-588)4121428-6</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Bestärkendes Lernen</subfield><subfield code="g">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4825546-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="689" ind1="1" ind2="0"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">Athena scientific optimization and computation series</subfield><subfield code="v">1</subfield><subfield code="w">(DE-604)BV015264203</subfield><subfield code="9">1</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=031582049&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-031582049</subfield></datafield></record></collection> |
id | DE-604.BV046202951 |
illustrated | Not Illustrated |
indexdate | 2025-01-13T13:01:29Z |
institution | BVB |
isbn | 9781886529397 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-031582049 |
oclc_num | 1126402004 |
open_access_boolean | |
owner | DE-384 DE-83 DE-91 DE-BY-TUM DE-739 DE-706 DE-29T DE-703 DE-20 DE-945 DE-1050 DE-898 DE-BY-UBR DE-523 |
owner_facet | DE-384 DE-83 DE-91 DE-BY-TUM DE-739 DE-706 DE-29T DE-703 DE-20 DE-945 DE-1050 DE-898 DE-BY-UBR DE-523 |
physical | xiv, 373 Seiten Diagramme |
publishDate | 2019 |
publishDateSearch | 2019 |
publishDateSort | 2019 |
publisher | Athena Scientific |
record_format | marc |
series | Athena scientific optimization and computation series |
series2 | Athena scientific optimization and computation series |
spelling | Bertsekas, Dimitri P. 1942- Verfasser (DE-588)171165519 aut Reinforcement learning and optimal control by Dimitri P. Bertsekas, Massachusetts Institute of Technology Belmont, Massachusetts Athena Scientific [2019] xiv, 373 Seiten Diagramme txt rdacontent n rdamedia nc rdacarrier Athena scientific optimization and computation series 1 Literaturverzeichnis: Seite 345-367 Künstliche Intelligenz (DE-588)4033447-8 gnd rswk-swf Optimale Kontrolle (DE-588)4121428-6 gnd rswk-swf Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 gnd rswk-swf Maschinelles Lernen (DE-588)4193754-5 gnd rswk-swf Künstliche Intelligenz (DE-588)4033447-8 s Optimale Kontrolle (DE-588)4121428-6 s Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 s DE-604 Maschinelles Lernen (DE-588)4193754-5 s Athena scientific optimization and computation series 1 (DE-604)BV015264203 1 Digitalisierung UB Passau - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=031582049&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Bertsekas, Dimitri P. 1942- Reinforcement learning and optimal control Athena scientific optimization and computation series Künstliche Intelligenz (DE-588)4033447-8 gnd Optimale Kontrolle (DE-588)4121428-6 gnd Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 gnd Maschinelles Lernen (DE-588)4193754-5 gnd |
subject_GND | (DE-588)4033447-8 (DE-588)4121428-6 (DE-588)4825546-4 (DE-588)4193754-5 |
title | Reinforcement learning and optimal control |
title_auth | Reinforcement learning and optimal control |
title_exact_search | Reinforcement learning and optimal control |
title_full | Reinforcement learning and optimal control by Dimitri P. Bertsekas, Massachusetts Institute of Technology |
title_fullStr | Reinforcement learning and optimal control by Dimitri P. Bertsekas, Massachusetts Institute of Technology |
title_full_unstemmed | Reinforcement learning and optimal control by Dimitri P. Bertsekas, Massachusetts Institute of Technology |
title_short | Reinforcement learning and optimal control |
title_sort | reinforcement learning and optimal control |
topic | Künstliche Intelligenz (DE-588)4033447-8 gnd Optimale Kontrolle (DE-588)4121428-6 gnd Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 gnd Maschinelles Lernen (DE-588)4193754-5 gnd |
topic_facet | Künstliche Intelligenz Optimale Kontrolle Bestärkendes Lernen Künstliche Intelligenz Maschinelles Lernen |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=031582049&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
volume_link | (DE-604)BV015264203 |
work_keys_str_mv | AT bertsekasdimitrip reinforcementlearningandoptimalcontrol |