Verfügbarkeit: Dynamic programming and optimal control

Dynamic programming and optimal control: Volume 2 Approximate dynamic programming

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Bertsekas, Dimitri P. 1942- (VerfasserIn)
Format:	Buch
Sprache:	English
Veröffentlicht:	Belmont, Mass. Athena Scientific [2012]
Ausgabe:	Fourth edition
Online-Zugang:	Inhaltsverzeichnis
Beschreibung:	xvii, 694 Seiten Diagramme
ISBN:	1886529442 9781886529441

Internformat

MARC


LEADER	00000nam a2200000 cc4500
001	BV040463410
003	DE-604
005	20171220
007	t
008	121010s2012 \|\|\|\| \|\|\|\| 00\|\|\| eng d
020			\|a 1886529442 \|9 1-886529-44-2
020			\|a 9781886529441 \|9 978-1-886529-44-1
035			\|a (OCoLC)815947736
035			\|a (DE-599)BVBBV040463410
040			\|a DE-604 \|b ger \|e rda
041	0		\|a eng
049			\|a DE-706 \|a DE-384 \|a DE-739 \|a DE-29T \|a DE-573 \|a DE-188 \|a DE-83 \|a DE-523 \|a DE-634 \|a DE-91G \|a DE-91
084			\|a SK 880 \|0 (DE-625)143266: \|2 rvk
084			\|a SM 613 \|0 (DE-625)143297: \|2 rvk
084			\|a MAT 917f \|2 stub
100	1		\|a Bertsekas, Dimitri P. \|d 1942- \|e Verfasser \|0 (DE-588)171165519 \|4 aut
245	1	0	\|a Dynamic programming and optimal control \|n Volume 2 \|p Approximate dynamic programming \|c Dimitri P. Bertsekas
250			\|a Fourth edition
264		1	\|a Belmont, Mass. \|b Athena Scientific \|c [2012]
300			\|a xvii, 694 Seiten \|b Diagramme
336			\|b txt \|2 rdacontent
337			\|b n \|2 rdamedia
338			\|b nc \|2 rdacarrier
773	0	8	\|w (DE-604)BV011951112 \|g 2
856	4	2	\|m Digitalisierung UB Passau - ADAM Catalogue Enrichment \|q application/pdf \|u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025310808&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA \|3 Inhaltsverzeichnis
999			\|a oai:aleph.bib-bvb.de:BVB01-025310808

Datensatz im Suchindex

_version_	1804149529558122496
adam_text	Contents 1. Discounted Problems — Theory 1.1. Minimization of Total Cost - Introduction ........ p. 3 1.1.1. The Finite-Horizon DP Algorithm ........ p. 5 1.1.2. Shorthand Notation and Monotoniei ty ...... p. б 1.1.3. A Preview of Infinite Horizon Results ....... p. 10 1.1.4. Randomized and History-Dependent Policies . . . . p. 11 1.2. Discounted Problems - Bounded Cost per Stage ..... p. 14 1.3. Scheduling and Multiarmed Bandit Problems ...... p. 22 1.4. Discounted Continuous-Time Problems ......... p. 32 1.5. The Role of Contraction Mappings ........... p. 45 1.5.1. Sup-Norm Contractions ............. p. 47 1.5.2. Discounted Problems - Unbounded Cost per Stage . p. 54 1.6. General Forms of Discounted Dynamic Programming . . . p. 57 1.6.1. Basic Results Under Contraction and Monotonicity . p. 63 1.6.2. Discounted Dynamic Games ........... p. 69 1.7. Notes, Sources, and Exercises ............. p. 71 2. Discounted Problems — Computational Methods 2.1. Markovian Decision Problems ............. p. 82 2.2. Value Iteration ................... p. 84 2.2.1. Monotonie Error Bounds for Value Iteration . . . . p. 85 2.2.2. Variants of Value Iteration ........... p. 92 2.2.3. Q-Learning .................. p. 95 2.3. Policy Iteration ................... p. 97 2.3.1. Policy Iteration for Costs ............ p. 97 2.3.2. Policy Iteration for Q-Factors ......... p. 102 2.3.3. Optimistic Policy Iteration .......... p. 103 2.3.4. Limited Lookahead Policies and Rollout ..... p. 106 2.4. Linear Programming Methods ............ p. 112 2.5. Methods for General Discounted Problems ...... p. 115 2.5.1. Limited Lookahead Policies and Approximations . p. 117 iii iv Contents 2.5.2. Generalized Value Iteration .......... p. 119 2.5.3. Approximate Value Iteration .......... p. 120 2.5.4. Generalized Policy Iteration .......... p. 123 2.5.5. Generalized Optimistic Policy Iteration ..... p. 126 2.5.6. Approximate Policy Iteration ......... p. 132 2.5.7. Mathematical Programming .......... p. 137 2.6. Asynchronous Algorithms .............. p. 138 2.6.1. Asynchronous Value Iteration ......... p. 138 2.6.2. Asynchronous Policy Iteration ......... p. 144 2.6.3. Policy Iteration with a Uniform Fixed Point . . . p. 149 2.7. Notes, Sources, and Exercises ............ p. 156 3. Stochastic Shortest Path Problems 3.1. Problem Formulation ................ p. 172 3.2. Main Results ................... p. 175 3.3. Underlying Contraction Properties .......... p. 182 3.4. Value Iteration .................. p. 184 3.4.1. Conditions for Finite Termination ....... p. 185 3.4.2. Asynchronous Value Iteration ......... p. 188 3.5. Policy Iteration .................. p. 189 3.5.1. Optimistic Policy Iteration .......... p. 190 3.5.2. Approximate Policy Iteration ......... p. 191 3.5.3. Policy Iteration with Improper Policies ..... p. 193 3.5.4. Policy Iteration with a Uniform Fixed Point . . . p. 197 3.6. Countable-State Problems ............. p. 201 3.7. Notes, Sources, and Exercises ............ p. 204 4. Undiscounted Problems 4.1. Unbounded Costs per Stage ............. p. 214 4.1.1. Main Results ................ p. 216 4.1.2. Value Iteration ................ p. 224 4.1.3. Other Computational Methods ......... p. 230 4.2. Linear Systems and Quadratic Cost ......... p. 231 4.3. Inventory Control ................. p. 233 4.4. Optimal Stopping ................. p. 235 4.5. Optimal Gambling Strategies ............ p. 241 4.6. Continuous-Time Problems - Control of Queues .... p. 248 4.7. Nonstationary and Periodic Problems ........ p. 256 4.8. Notes, Sources, and Exercises ............ p. 261 5. Average Cost per Stage Problems 5.1. Finite-Spaces Average Cost Models ......... p. 274 5.1.1. Relation with the Discounted Cost Problem ... p. 278 Contents v 5.1.2. Blackwell Optimal Policies .......... p. 284 5.1.3. Optimality Equations ............ p. 294 5.2. Conditions for Equal Average Cost for all Initial States . p. 298 5.3. Value Iteration .................. p. 304 5.3.1. Single-Chain Value Iteration .......... p. 307 5.3.2. Multi-Chain Value Iteration .......... p. 322 5.4. Policy Iteration .................. p. 329 5.4.1. Single-Chain Policy Iteration ......... p. 329 5.4.2. Multi-Chain Policy Iteration .......... p. 335 5.5. Linear Programming ................ p. 339 5.6. Infinite-Spaces Average Cost Models ......... p. 345 5.6.1. A Sufficient Condition for Optimality...... p. 353 5.6.2. Finite State Space and Infinite Control Space . . p. 355 5.6.3. Countable States - Vanishing Discount Approach p. 364 5.6.4. Countable States Contraction Approach .... p. 367 5.6.5. Linear Systems with Quadratic Cost ...... p. 372 5.7. Notes, Sources, and Exercises ............ p. 374 6. Approximate Dynamic Programming - Discounted Models 6.1. General Issues of Simulation-Based Cost Approximation . . p. 391 6.1.1. Approximation Architectures ......... p. 391 6.1.2. Simulation-Based Approximate Policy Iteration . p. 397 6.1.3. Direct and Indirect Approximation ....... p. 403 6.1.4. Monte Carlo Simulation ............ p. 405 6.1.5. Simplifications ................ p. 413 6.2. Direct Policy Evaluation - Gradient Methods ...... p. 418 6.3. Projected Equation Methods for Policy Evaluation .... p. 423 6.3.1. The Projected Bellman Equation ........ p. 424 6.3.2. The Matrix Form of the Projected Equation ... p. 428 6.3.3. Simulation-Based Methods .......... p. 431 6.3.4. LSTD, LSPE, and TD(0) Methods ....... p. 433 6.3.5. Optimistic Versions .............. p. 437 6.3.6. Multistep Simulation-Based Methods ...... p. 438 6.3.7. A Synopsis ................. p. 447 6.4. Policy Iteration Issues ................ p, 451 6.4.1. Exploration Enhancement by Geometric Sampling p. 453 6.4.2. Exploration Enhancement by Off-Policy Methods p. 464 6.4.3. Policy Oscillations ~ Chattering ........ p. 467 6.5. Aggregation Methods ................. p. 474 6.5.1. Cost Approximation via the Aggregate Problem . p. 482 6.5.2. Cost Approximation via the Enlarged Problem . . p. 485 6.5.3. Multistep Aggregation ............ p. 490 6.5.4. Asynchronous Distributed Aggregation ..... p. 491 6.6. Q-Learning ..................... p. 493 vi Contents 6.6.1. Q-Learning: A Stochastic VI Algorithm .... p. 494 6.6.2. Q-Learning and Policy Iteration ........ p. 496 6.6.3. Q-Factor Approximation and Projected Equations p. 499 6.6.4. Q-Learning for Optimal Stopping Problems ... p. 502 6.6.5. Q-Learning and Aggregation .......... p. 507 6.6.6. Finite Horizon Q-Learning .......... p. 509 6.7. Notes, Sources, and Exercises ............. p. 511 7. Approximate Dynamic Programming - Nondiscount ed Models and Generalizations 7.1. Stochastic Shortest Path Problems ........... p. 532 7.2. Average Cost Problems ................ p. 537 7.2.1. Approximate Policy Evaluation ........ p. 537 7.2.2. Approximate Policy Iteration ......... p. 546 7.2.3. Q-Learning for Average Cost Problems ..... p. 548 7.3. General Problems and Monte Carlo Linear Algebra .... p. 552 7.3.1. Projected Equations ............. p. 562 7.3.2. Matrix Inversion and Iterative Methods ..... p. 569 7.3.3. Multistep Methods .............. p. 576 7.3.4. Extension of Q-Learning for Optimal Stopping . . p. 584 7.3.5. Equation Error Methods ........... p. 586 7.3.6. Oblique Projections ............. p. 591 7.3.7. Generalized Aggregation ........... p. 593 7.3.8. Deterministic Methods for Singular Linear Systems p. 597 7.3.9. Stochastic Methods for Singular Linear Systems . p. 608 7.4. Approximation in Policy Space ............. p. 620 7.4.1. The Gradient Formula ............ p. 622 7.4.2. Computing the Gradient by Simulation ..... p. 623 7.4.3. Essential Features for Gradient Evaluation ... p. 625 7.4.4. Approximations in Policy and Value Space ... p. 627 7.5. Notes. Sources, and Exercises ............. p. 629 Appendix A: Measure-Theoretic Issues in Dynamic Programming A.I. A Two-Stage Example ............. p. 641 A. 2. Resolution of the Measurability Issues ...... p. 646 References ....................... p. 657 Index .......................... p. 691
any_adam_object	1
author	Bertsekas, Dimitri P. 1942-
author_GND	(DE-588)171165519
author_facet	Bertsekas, Dimitri P. 1942-
author_role	aut
author_sort	Bertsekas, Dimitri P. 1942-
author_variant	d p b dp dpb
building	Verbundindex
bvnumber	BV040463410
classification_rvk	SK 880 SM 613
classification_tum	MAT 917f
ctrlnum	(OCoLC)815947736 (DE-599)BVBBV040463410
discipline	Mathematik
edition	Fourth edition
format	Book
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01400nam a2200337 cc4500</leader><controlfield tag="001">BV040463410</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20171220 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">121010s2012 \|\|\|\| \|\|\|\| 00\|\|\| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">1886529442</subfield><subfield code="9">1-886529-44-2</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781886529441</subfield><subfield code="9">978-1-886529-44-1</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)815947736</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV040463410</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-706</subfield><subfield code="a">DE-384</subfield><subfield code="a">DE-739</subfield><subfield code="a">DE-29T</subfield><subfield code="a">DE-573</subfield><subfield code="a">DE-188</subfield><subfield code="a">DE-83</subfield><subfield code="a">DE-523</subfield><subfield code="a">DE-634</subfield><subfield code="a">DE-91G</subfield><subfield code="a">DE-91</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">SK 880</subfield><subfield code="0">(DE-625)143266:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">SM 613</subfield><subfield code="0">(DE-625)143297:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">MAT 917f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Bertsekas, Dimitri P.</subfield><subfield code="d">1942-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)171165519</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Dynamic programming and optimal control</subfield><subfield code="n">Volume 2</subfield><subfield code="p">Approximate dynamic programming</subfield><subfield code="c">Dimitri P. Bertsekas</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">Fourth edition</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Belmont, Mass.</subfield><subfield code="b">Athena Scientific</subfield><subfield code="c">[2012]</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xvii, 694 Seiten</subfield><subfield code="b">Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="w">(DE-604)BV011951112</subfield><subfield code="g">2</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025310808&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-025310808</subfield></datafield></record></collection>
id	DE-604.BV040463410
illustrated	Not Illustrated
indexdate	2024-07-10T00:24:29Z
institution	BVB
isbn	1886529442 9781886529441
language	English
oai_aleph_id	oai:aleph.bib-bvb.de:BVB01-025310808
oclc_num	815947736
open_access_boolean
owner	DE-706 DE-384 DE-739 DE-29T DE-573 DE-188 DE-83 DE-523 DE-634 DE-91G DE-BY-TUM DE-91 DE-BY-TUM
owner_facet	DE-706 DE-384 DE-739 DE-29T DE-573 DE-188 DE-83 DE-523 DE-634 DE-91G DE-BY-TUM DE-91 DE-BY-TUM
physical	xvii, 694 Seiten Diagramme
publishDate	2012
publishDateSearch	2012
publishDateSort	2012
publisher	Athena Scientific
record_format	marc
spelling	Bertsekas, Dimitri P. 1942- Verfasser (DE-588)171165519 aut Dynamic programming and optimal control Volume 2 Approximate dynamic programming Dimitri P. Bertsekas Fourth edition Belmont, Mass. Athena Scientific [2012] xvii, 694 Seiten Diagramme txt rdacontent n rdamedia nc rdacarrier (DE-604)BV011951112 2 Digitalisierung UB Passau - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025310808&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis
spellingShingle	Bertsekas, Dimitri P. 1942- Dynamic programming and optimal control
title	Dynamic programming and optimal control
title_auth	Dynamic programming and optimal control
title_exact_search	Dynamic programming and optimal control
title_full	Dynamic programming and optimal control Volume 2 Approximate dynamic programming Dimitri P. Bertsekas
title_fullStr	Dynamic programming and optimal control Volume 2 Approximate dynamic programming Dimitri P. Bertsekas
title_full_unstemmed	Dynamic programming and optimal control Volume 2 Approximate dynamic programming Dimitri P. Bertsekas
title_short	Dynamic programming and optimal control
title_sort	dynamic programming and optimal control approximate dynamic programming
url	http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025310808&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA
volume_link	(DE-604)BV011951112
work_keys_str_mv	AT bertsekasdimitrip dynamicprogrammingandoptimalcontrolvolume2

Verfügbarkeit

Es ist kein Print-Exemplar vorhanden.

Fernleihe Bestellen Achtung: Nicht im THWS-Bestand! Inhaltsverzeichnis

MARC

Datensatz im Suchindex

Es ist kein Print-Exemplar vorhanden.

Ähnliche Einträge