Rollout, policy iteration, and distributed reinforcement learning:
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Belmont, Massachusetts
Athena Scientific
[2020]
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | Literaturverzeichnis: Seite 337-356 |
Beschreibung: | xi, 361 Seiten Diagramme |
ISBN: | 9781886529076 1886529078 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV047026841 | ||
003 | DE-604 | ||
005 | 20210520 | ||
007 | t | ||
008 | 201124s2020 xxu|||| |||| 00||| eng d | ||
020 | |a 9781886529076 |9 978-1-886529-07-6 | ||
020 | |a 1886529078 |9 1-886529-07-8 | ||
035 | |a (OCoLC)1196818801 | ||
035 | |a (DE-599)KXP1731767242 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
044 | |a xxu |c XD-US | ||
049 | |a DE-29T |a DE-384 |a DE-91 |a DE-739 |a DE-1050 | ||
084 | |a QH 423 |0 (DE-625)141577: |2 rvk | ||
084 | |a MAT 917 |2 stub | ||
084 | |a 54.72 |2 bkl | ||
100 | 1 | |a Bertsekas, Dimitri P. |d 1942- |e Verfasser |0 (DE-588)171165519 |4 aut | |
245 | 1 | 0 | |a Rollout, policy iteration, and distributed reinforcement learning |c by Dimitri P. Bertsekas (Arizona State University and Massachusetts Institute of Technology) |
264 | 1 | |a Belmont, Massachusetts |b Athena Scientific |c [2020] | |
300 | |a xi, 361 Seiten |b Diagramme | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
500 | |a Literaturverzeichnis: Seite 337-356 | ||
650 | 0 | 7 | |a Dynamische Optimierung |0 (DE-588)4125677-3 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Bestärkendes Lernen |g Künstliche Intelligenz |0 (DE-588)4825546-4 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Bestärkendes Lernen |g Künstliche Intelligenz |0 (DE-588)4825546-4 |D s |
689 | 0 | 1 | |a Dynamische Optimierung |0 (DE-588)4125677-3 |D s |
689 | 0 | |5 DE-604 | |
856 | 4 | 2 | |m Digitalisierung UB Augsburg - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032434201&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-032434201 |
Datensatz im Suchindex
_version_ | 1804181996112445440 |
---|---|
adam_text | Contents 1. Exact Dynamic Programming 1.1. Deterministic Dynamic Programming...................................... p. 2 1.1.1. Basic Finite Horizon Problem Formulation................... p. 2 1.1.2. The Dynamic Programming Algorithm...........................p. 5 1.1.3. Approximation in Value Space...................................... p. 7 1.2. Stochastic Dynamic Programming.................................................p. 10 1.2.1. Finite Horizon Problems.................................................... p. 10 1.2.2. Infinite Horizon Problems - An Overview..........................p. 14 1.3. Examples, Variations, and Simplifications................................. p. 20 1.3.1. Discrete Deterministic Optimization................................. p. 21 1.3.2. Problems with a Termination State................................. p. 25 1.3.3. State Augmentation, Time Delays, and Forecasts . . . p. 29 1.3.4. Partial State Information and Belief States......................p. 32 1.4. Reinforcement Learning and Optimal Control ֊ Some................... Terminology................................................................................... p. 35 1.5. Notes and Sources ........................................................................p. 37 2. Rollout and Policy Improvement 2.1. Approximation in Value and Policy Space...............................p. 43 2.1.1. Approximation in Value Space - One-Step and................... Multistep Lookahead ........................................................ p. 43 2.1.2. Approximation in Policy
Space.....................................p. 47 2.1.3. Combined Approximation in Value and Policy Space . . p. 49 2.2. General Issues of Approximation in Value Space ................... p. 53 2.2.1. Model-Based and Model-Free Implementations . . . . p. 53 2.2.2. Off-Line and On-Line Implementations......................... p. 54 2.2.3. Methods for Cost-to-Go Approximation..................... p. 56 2.2.4. Methods for Simplification of the Lookahead....................... Minimization................................................................... p. 58 2.2.5. Simplification of the Lookahead Minimization....................... by Q-Factor Approximation .............................................p. 59
vi Contents 2.3. Rollout and the Policy Impro rement Principle....................... p. 62 2.3.1. On-Line Rollout for Deterministic Discrete........................... Optimization.................................................................... p. 64 2.3.2. Using Multiple Base Heuristics - Parallel Rollout . . . p. 72 2.3.3. The Fortified Rollout Algorithm..................................... p. 73 2.3.4. Truncated Rollout with Multistep Lookahead and............... Terminal Cost Approximation......................................... p. 76 2.3.5. Rollout with Small Stage Costs and Long Horizon - . . . . Continuous-Time Rollout.................................................p. 78 2.3.6. Rollout with an Expert .................................................... p. 88 2.4. Stochastic Rollout and Monte Carlo Tree Search................... p. 91 2.4.1. Simulation-Based Implementation of the Rollout ............... Algorithm........................................................................p. 94 2.4.2. Rollout and Monte Carlo Tree Search..........................p. 98 2.4.3. Randomized Policy Improvement by Monte Carlo............... Tree Search..............................................................p. 102 2.4.4. Rollout Parallelization...........................................p. 103 2.4.5. The Effect of Errors in Rollout - Variance........................... Reduction..............................................................p. 104 2.5. Rollout for Infinite-Spaces Problems - Optimization.......................
Heuristics......................................................................... p. 107 2.5.1. Rollout for Infinite-Spaces Deterministic Problems . . p. 107 2.5.2. Rollout Based on Stochastic Programming........ p. Ill 2.6. Notes and Sources ......................................................................p. 114 3. Specialized Rollout Algorithms 3.1. Model Predictive Control .......................................................... p. 120 3.1.1. Target Tubes and Constrained Controllability ................... Condition ......................................................................... p. 127 3.1.2. Model Predictive Control with Terminal Cost .... p. 131 3.1.3. Variants of Model Predictive Control ............................p. 132 3.2. Multiagent Rollout.......................................................... p. 135 3.2.1. Multiagent Coupling Through Constraints........ p. 145 3.2.2. Multiagent Rollout for Separable and Multiarmed .... Bandit Problems.................................................. p. 147 3.2.3. Multiagent Model Predictive Control ........................... p. 150 3.2.4. Asynchronous Distributed Multiagent Rollout .... p. 151 3.3. Constrained Rollout for Deterministic Optimization .... p. 155 3.3.1. State-Constrained Rollout and Target Tubes .... p. 156 3.3.2. Rollout with Trajectory Constraints....................p. 160 3.3.3. Constrained Multiagent Rollout........................... p. 169 3.4. Constrained Rollout - Combinatorial and Discrete .......................
Optimization......................................................................p. 172 3.4.1. A General Discrete Optimization Problem........ p. 172
Vil Contents 3.4.2. Multidimensional Assignment...................................... p. 3 3. Surrogate Dynamic Programming and Rollout................... p. 3.5.1. Rollout for Bayesian Optimization...............................p. 3 6. Rollout for Minimax Control..................................................p. 3.7. Notes and Sources ................................................................. p. 179 187 189 193 196 4. Learning Values and Policies 4.1. Approximation Architectures..................................................p. 204 4.1.1. Feature-Based Architectures...................................... p. 205 4.1.2. Training of Linear and Nonlinear Architectures . . . p. 215 4.2. Neural Networks.....................................................................p. 219 4.2.1. Training of Neural Networks...................................... p.223 4.2.2. Multilayer and Deep Neural Networks....................... p. 224 4.3. Training of Cost Functions in Approximate DP................... p. 226 4.3.1. Fitted Value Iteration..................................................p. 226 4.3.2. Q-Factor Parametric Approximation...........................p. 228 4.3.3. Advantage Updating - Approximating Q-Factor ............... Differences..................................................................... p. 230 4.3.4. Differential Training of Cost Differences for Rollout . p. 233 4.4. Training of Policies in Approximate DP...............................p. 235 4.4.1. Perpetual Rollout with Value and Policy Networks ... Multiprocessor
Parallelization...................................... p. 239 4.5. Notes and Sources ................................................................. p. 240 5. Infinite Horizon: Distributed and Multiagent Algorithms 5.1. Stochastic Shortest Path and Discounted Problems .... p. 248 5.2. Exact and Approximate Policy Iteration...............................p. 260 5.2.1. Policy Iteration and Rollout...................................... p. 261 5.2.2. Optimistic and Multistep Policy Iteration ....................... Truncated Rollout......................................................... p. 265 5.2.3. Policy Iteration for Q-Factors...................................... p. 268 5.2.4. Multiagent Rollout and PolicyIteration.................... p. 270 5.2.5. Approximation in Value Space.................................. p. 277 •5.2.6. Performance Bounds for Truncated Rollout and ............... Approximate Policy Iteration...................................... p. 279 5.3. Abstract View of Infinite Horizon Problems ....................... p. 290 5.4. Multiagent Value and Policy Iteration.................................. p. 301 5.4.1. Convergence to an Agent-by-Agent Optimal Policy . p. 305 5.4.2. Optimistic Multiagent PI Algorithms ....................... p. 310 5.5. Asynchronous Distributed Value Iteration...........................p. 313 5.5.1. State Space Partitioning.............................................. p. 314 5.5.2. Asynchronous Convergence Theorem...........................p. 315 5.6. Asynchronous Distributed Policy Iteration...........................p.318 5.6.1.
Randomized Asynchronous Optimistic Policy.......................
viii Contents Iteration............................................................................. p. 320 5.6.2. Asynchronous Optimistic Policy Iteration with a ............... Uniform FixedPoint ........................................................ p. 323 5.6.3. Approximate Policy Iteration - Asynchronous ................... MultiprocessorParallelization............................................ p. 330 5.7. Notes and Sources .................................................................p. 332 References......................................................................................... p. 337 Index p. 357
|
adam_txt |
Contents 1. Exact Dynamic Programming 1.1. Deterministic Dynamic Programming. p. 2 1.1.1. Basic Finite Horizon Problem Formulation. p. 2 1.1.2. The Dynamic Programming Algorithm.p. 5 1.1.3. Approximation in Value Space. p. 7 1.2. Stochastic Dynamic Programming.p. 10 1.2.1. Finite Horizon Problems. p. 10 1.2.2. Infinite Horizon Problems - An Overview.p. 14 1.3. Examples, Variations, and Simplifications. p. 20 1.3.1. Discrete Deterministic Optimization. p. 21 1.3.2. Problems with a Termination State. p. 25 1.3.3. State Augmentation, Time Delays, and Forecasts . . . p. 29 1.3.4. Partial State Information and Belief States.p. 32 1.4. Reinforcement Learning and Optimal Control ֊ Some. Terminology. p. 35 1.5. Notes and Sources .p. 37 2. Rollout and Policy Improvement 2.1. Approximation in Value and Policy Space.p. 43 2.1.1. Approximation in Value Space - One-Step and. Multistep Lookahead . p. 43 2.1.2. Approximation in Policy
Space.p. 47 2.1.3. Combined Approximation in Value and Policy Space . . p. 49 2.2. General Issues of Approximation in Value Space . p. 53 2.2.1. Model-Based and Model-Free Implementations . . . . p. 53 2.2.2. Off-Line and On-Line Implementations. p. 54 2.2.3. Methods for Cost-to-Go Approximation. p. 56 2.2.4. Methods for Simplification of the Lookahead. Minimization. p. 58 2.2.5. Simplification of the Lookahead Minimization. by Q-Factor Approximation .p. 59
vi Contents 2.3. Rollout and the Policy Impro\rement Principle. p. 62 2.3.1. On-Line Rollout for Deterministic Discrete. Optimization. p. 64 2.3.2. Using Multiple Base Heuristics - Parallel Rollout . . . p. 72 2.3.3. The Fortified Rollout Algorithm. p. 73 2.3.4. Truncated Rollout with Multistep Lookahead and. Terminal Cost Approximation. p. 76 2.3.5. Rollout with Small Stage Costs and Long Horizon - . . . . Continuous-Time Rollout.p. 78 2.3.6. Rollout with an Expert . p. 88 2.4. Stochastic Rollout and Monte Carlo Tree Search. p. 91 2.4.1. Simulation-Based Implementation of the Rollout . Algorithm.p. 94 2.4.2. Rollout and Monte Carlo Tree Search.p. 98 2.4.3. Randomized Policy Improvement by Monte Carlo. Tree Search.p. 102 2.4.4. Rollout Parallelization.p. 103 2.4.5. The Effect of Errors in Rollout - Variance. Reduction.p. 104 2.5. Rollout for Infinite-Spaces Problems - Optimization.
Heuristics. p. 107 2.5.1. Rollout for Infinite-Spaces Deterministic Problems . . p. 107 2.5.2. Rollout Based on Stochastic Programming. p. Ill 2.6. Notes and Sources .p. 114 3. Specialized Rollout Algorithms 3.1. Model Predictive Control . p. 120 3.1.1. Target Tubes and Constrained Controllability . Condition . p. 127 3.1.2. Model Predictive Control with Terminal Cost . p. 131 3.1.3. Variants of Model Predictive Control .p. 132 3.2. Multiagent Rollout. p. 135 3.2.1. Multiagent Coupling Through Constraints. p. 145 3.2.2. Multiagent Rollout for Separable and Multiarmed . Bandit Problems. p. 147 3.2.3. Multiagent Model Predictive Control . p. 150 3.2.4. Asynchronous Distributed Multiagent Rollout . p. 151 3.3. Constrained Rollout for Deterministic Optimization . p. 155 3.3.1. State-Constrained Rollout and Target Tubes . p. 156 3.3.2. Rollout with Trajectory Constraints.p. 160 3.3.3. Constrained Multiagent Rollout. p. 169 3.4. Constrained Rollout - Combinatorial and Discrete .
Optimization.p. 172 3.4.1. A General Discrete Optimization Problem. p. 172
Vil Contents 3.4.2. Multidimensional Assignment. p. 3 3. Surrogate Dynamic Programming and Rollout. p. 3.5.1. Rollout for Bayesian Optimization.p. 3 6. Rollout for Minimax Control.p. 3.7. Notes and Sources . p. 179 187 189 193 196 4. Learning Values and Policies 4.1. Approximation Architectures.p. 204 4.1.1. Feature-Based Architectures. p. 205 4.1.2. Training of Linear and Nonlinear Architectures . . . p. 215 4.2. Neural Networks.p. 219 4.2.1. Training of Neural Networks. p.223 4.2.2. Multilayer and Deep Neural Networks. p. 224 4.3. Training of Cost Functions in Approximate DP. p. 226 4.3.1. Fitted Value Iteration.p. 226 4.3.2. Q-Factor Parametric Approximation.p. 228 4.3.3. Advantage Updating - Approximating Q-Factor . Differences. p. 230 4.3.4. Differential Training of Cost Differences for Rollout . p. 233 4.4. Training of Policies in Approximate DP.p. 235 4.4.1. Perpetual Rollout with Value and Policy Networks . Multiprocessor
Parallelization. p. 239 4.5. Notes and Sources . p. 240 5. Infinite Horizon: Distributed and Multiagent Algorithms 5.1. Stochastic Shortest Path and Discounted Problems . p. 248 5.2. Exact and Approximate Policy Iteration.p. 260 5.2.1. Policy Iteration and Rollout. p. 261 5.2.2. Optimistic and Multistep Policy Iteration . Truncated Rollout. p. 265 5.2.3. Policy Iteration for Q-Factors. p. 268 5.2.4. Multiagent Rollout and PolicyIteration. p. 270 5.2.5. Approximation in Value Space. p. 277 •5.2.6. Performance Bounds for Truncated Rollout and . Approximate Policy Iteration. p. 279 5.3. Abstract View of Infinite Horizon Problems . p. 290 5.4. Multiagent Value and Policy Iteration. p. 301 5.4.1. Convergence to an Agent-by-Agent Optimal Policy . p. 305 5.4.2. Optimistic Multiagent PI Algorithms . p. 310 5.5. Asynchronous Distributed Value Iteration.p. 313 5.5.1. State Space Partitioning. p. 314 5.5.2. Asynchronous Convergence Theorem.p. 315 5.6. Asynchronous Distributed Policy Iteration.p.318 5.6.1.
Randomized Asynchronous Optimistic Policy.
viii Contents Iteration. p. 320 5.6.2. Asynchronous Optimistic Policy Iteration with a . Uniform FixedPoint . p. 323 5.6.3. Approximate Policy Iteration - Asynchronous . MultiprocessorParallelization. p. 330 5.7. Notes and Sources .p. 332 References. p. 337 Index p. 357 |
any_adam_object | 1 |
any_adam_object_boolean | 1 |
author | Bertsekas, Dimitri P. 1942- |
author_GND | (DE-588)171165519 |
author_facet | Bertsekas, Dimitri P. 1942- |
author_role | aut |
author_sort | Bertsekas, Dimitri P. 1942- |
author_variant | d p b dp dpb |
building | Verbundindex |
bvnumber | BV047026841 |
classification_rvk | QH 423 |
classification_tum | MAT 917 |
ctrlnum | (OCoLC)1196818801 (DE-599)KXP1731767242 |
discipline | Mathematik Wirtschaftswissenschaften |
discipline_str_mv | Mathematik Wirtschaftswissenschaften |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01744nam a2200397 c 4500</leader><controlfield tag="001">BV047026841</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20210520 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">201124s2020 xxu|||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781886529076</subfield><subfield code="9">978-1-886529-07-6</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">1886529078</subfield><subfield code="9">1-886529-07-8</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1196818801</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)KXP1731767242</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxu</subfield><subfield code="c">XD-US</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-29T</subfield><subfield code="a">DE-384</subfield><subfield code="a">DE-91</subfield><subfield code="a">DE-739</subfield><subfield code="a">DE-1050</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">QH 423</subfield><subfield code="0">(DE-625)141577:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">MAT 917</subfield><subfield code="2">stub</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.72</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Bertsekas, Dimitri P.</subfield><subfield code="d">1942-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)171165519</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Rollout, policy iteration, and distributed reinforcement learning</subfield><subfield code="c">by Dimitri P. Bertsekas (Arizona State University and Massachusetts Institute of Technology)</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Belmont, Massachusetts</subfield><subfield code="b">Athena Scientific</subfield><subfield code="c">[2020]</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xi, 361 Seiten</subfield><subfield code="b">Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Literaturverzeichnis: Seite 337-356</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Dynamische Optimierung</subfield><subfield code="0">(DE-588)4125677-3</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Bestärkendes Lernen</subfield><subfield code="g">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4825546-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Bestärkendes Lernen</subfield><subfield code="g">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4825546-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Dynamische Optimierung</subfield><subfield code="0">(DE-588)4125677-3</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Augsburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032434201&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-032434201</subfield></datafield></record></collection> |
id | DE-604.BV047026841 |
illustrated | Not Illustrated |
index_date | 2024-07-03T16:01:14Z |
indexdate | 2024-07-10T09:00:31Z |
institution | BVB |
isbn | 9781886529076 1886529078 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-032434201 |
oclc_num | 1196818801 |
open_access_boolean | |
owner | DE-29T DE-384 DE-91 DE-BY-TUM DE-739 DE-1050 |
owner_facet | DE-29T DE-384 DE-91 DE-BY-TUM DE-739 DE-1050 |
physical | xi, 361 Seiten Diagramme |
publishDate | 2020 |
publishDateSearch | 2020 |
publishDateSort | 2020 |
publisher | Athena Scientific |
record_format | marc |
spelling | Bertsekas, Dimitri P. 1942- Verfasser (DE-588)171165519 aut Rollout, policy iteration, and distributed reinforcement learning by Dimitri P. Bertsekas (Arizona State University and Massachusetts Institute of Technology) Belmont, Massachusetts Athena Scientific [2020] xi, 361 Seiten Diagramme txt rdacontent n rdamedia nc rdacarrier Literaturverzeichnis: Seite 337-356 Dynamische Optimierung (DE-588)4125677-3 gnd rswk-swf Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 gnd rswk-swf Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 s Dynamische Optimierung (DE-588)4125677-3 s DE-604 Digitalisierung UB Augsburg - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032434201&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Bertsekas, Dimitri P. 1942- Rollout, policy iteration, and distributed reinforcement learning Dynamische Optimierung (DE-588)4125677-3 gnd Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 gnd |
subject_GND | (DE-588)4125677-3 (DE-588)4825546-4 |
title | Rollout, policy iteration, and distributed reinforcement learning |
title_auth | Rollout, policy iteration, and distributed reinforcement learning |
title_exact_search | Rollout, policy iteration, and distributed reinforcement learning |
title_exact_search_txtP | Rollout, policy iteration, and distributed reinforcement learning |
title_full | Rollout, policy iteration, and distributed reinforcement learning by Dimitri P. Bertsekas (Arizona State University and Massachusetts Institute of Technology) |
title_fullStr | Rollout, policy iteration, and distributed reinforcement learning by Dimitri P. Bertsekas (Arizona State University and Massachusetts Institute of Technology) |
title_full_unstemmed | Rollout, policy iteration, and distributed reinforcement learning by Dimitri P. Bertsekas (Arizona State University and Massachusetts Institute of Technology) |
title_short | Rollout, policy iteration, and distributed reinforcement learning |
title_sort | rollout policy iteration and distributed reinforcement learning |
topic | Dynamische Optimierung (DE-588)4125677-3 gnd Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 gnd |
topic_facet | Dynamische Optimierung Bestärkendes Lernen Künstliche Intelligenz |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032434201&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT bertsekasdimitrip rolloutpolicyiterationanddistributedreinforcementlearning |