From shortest paths to reinforcement learning: a MATLAB-based tutorial on dynamic programming
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Cham
Springer
[2021]
|
Schriftenreihe: | EURO Advanced Tutorials on Operational Research
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | xi, 207 Seiten Diagramme |
ISBN: | 3030618668 9783030618667 9783030618698 |
ISSN: | 2364-687X |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV047077795 | ||
003 | DE-604 | ||
005 | 20220922 | ||
007 | t | ||
008 | 210105s2021 |||| |||| 00||| eng d | ||
020 | |a 3030618668 |9 3030618668 | ||
020 | |a 9783030618667 |c hc |9 978-3-030-61866-7 | ||
020 | |a 9783030618698 |c pbk |9 978-3-030-61869-8 | ||
035 | |a (OCoLC)1235890058 | ||
035 | |a (DE-599)BVBBV047077795 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-384 |a DE-20 |a DE-N2 | ||
084 | |a QH 423 |0 (DE-625)141577: |2 rvk | ||
084 | |a ST 601 |0 (DE-625)143682: |2 rvk | ||
100 | 1 | |a Brandimarte, Paolo |d 1963- |e Verfasser |0 (DE-588)1025449061 |4 aut | |
245 | 1 | 0 | |a From shortest paths to reinforcement learning |b a MATLAB-based tutorial on dynamic programming |c Paolo Brandimarte |
264 | 1 | |a Cham |b Springer |c [2021] | |
264 | 4 | |c © 2021 | |
300 | |a xi, 207 Seiten |b Diagramme | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a EURO Advanced Tutorials on Operational Research |x 2364-687X | |
650 | 0 | 7 | |a Dynamische Optimierung |0 (DE-588)4125677-3 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Dynamische Optimierung |0 (DE-588)4125677-3 |D s |
689 | 0 | |5 DE-604 | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |z 978-3-030-61867-4 |
856 | 4 | 2 | |m Digitalisierung UB Augsburg - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032484700&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-032484700 |
Datensatz im Suchindex
_version_ | 1804182085734236160 |
---|---|
adam_text | Contents 1 2 The Dynamic Programming Principle.................................................... 1.1 What Is Dynamic Programming?..................................................... 1.2 Dynamic Decision Problems............................................................ 1.2.1 Finite Horizon, Discounted Problems .................................. 1.2.2 Infinite Horizon, Discounted Problems................................. 1.2.3 Infinite Horizon, Average Contribution Per Stage Problems............................................................................... 1.2.4 Problems with an Undefined Horizon .................................. 1.2.5 Decision Policies.................................................................... 1.3 An Example: Dynamic Single-Item Lot-Sizing................................ 1.4 A Glimpse of the DP Principle: The ShortestPath Problem............... 1.4.1 Forward vs. Backward DP..................................................... 1.4.2 Shortest Paths on Structured Networks................................. 1.4.3 Stochastic Shortest Paths....................................................... 1.5 The DP Decomposition Principle...................................................... 1.5.1 Stochastic DP for Finite Time Horizons................................ 1.5.2 Stochastic DP for Infinite Time Horizons............................. 1.6 For Further Reading.......................................................................... References..................................................................................................
Implementing Dynamic Programming.................................................. 2.1 Discrete Resource Allocation: The Knapsack Problem..................... 2.2 Continuous Budget Allocation.......................................................... 2.2.1 Interlude: Function Interpolation by Cubic Splines in MATLAB........................................................................... 2.2.2 Solving the Continuous Budget Allocation Problem by Numerical DP........................................................................ 2.3 Stochastic Inventory Control............................................................ 2.4 Exploiting Structure........................................................................... 2.4.1 Using Shortest Paths to Solve the Deterministic Lot-Sizing Problem............................................................... 1 2 4 8 10 10 11 11 13 16 22 24 25 26 29 31 32 33 35 36 41 44 48 51 56 57 ix
x Contents 2.4.2 Stochastic Lot-Sizing: 5 and (í, 5) Policies........................... 2.4.3 Structural Properties of Value Functions................................ 2.5 The Curses of Dynamic Programming................................................. 2.5.1 The Curse of State Dimensionality .......................................... 2.5.2 The Curse of Optimization........................................................ 2.5.3 The Curse of Expectation .......................................................... 2.5.4 The Curse of Modeling............................................................. 2.6 For Further Reading................................................................................ References........................................................................................................ 60 63 64 64 65 65 65 66 66 3 Modeling for Dynamic Programming....................................................... 3.1 Finite Markov Decision Processes....................................................... 3.2 Different Shades of Stochastic DP....................................................... 3.2.1 Post-decision State Variables.................................................... 3.3 Variations on Inventory Management................................................... 3.3.1 Deterministic Lead Time.......................................................... 3.3.2 Perishable Items ........................................................................ 3.4 Revenue Management........................................................................... 3.4.1 Static
Model with Perfect Demand Segmentation................. 3.4.2 Dynamic Model with Perfect Demand Segmentation............ 3.4.3 Dynamic Model with Customer Choice.................................. 3.5 Pricing Financial Options with Early Exercise Features.................... 3.5.1 Bias Issues in Dynamic Programming..................................... 3.6 Consumption-Saving with Uncertain Labor Income.......................... 3.7 For Further Reading................................................................................ References........................................................................................................ 67 68 74 76 78 78 79 81 83 86 87 88 92 93 96 96 4 NumericalDynamic Programming for Discrete States........................... 4.1 Discrete-Time Markov Chains ............................................................. 4.2 Markov Decision Processes with a Finite TimeHorizon..................... 4.2.1 A Numerical Example: Random Walks and Optimal Stopping...................................................................................... 4.3 Markov Decision Processes with an Infinite Time Horizon............... 4.4 Value Iteration......................................................................................... 4.4.1 A Numerical Example of Value Iteration................................ 4.5 Policy Iteration....................................................................................... 4.5.1 A Numerical Example of Policy Iteration............................... 4.6 Value vs. Policy
Iteration....................................................................... 4.7 Average Contribution Per Stage............................................................ 4.7.1 Relative Value Iteration for Problems Involving Average Contributions Per Stage.............................................. 4.7.2 Policy Iteration for Problems Involving Average Contributions Per Stage ............................................................ 4.7.3 An Example of Application to Preventive Maintenance........ 4.8 For Further Reading................................................................................ References........................................................................................................ 99 100 102 103 107 109 Ill 117 122 123 125 130 132 135 139 140
Contents ς Approximate Dynamic Programming and Reinforcement Learning for Discrete States.................................................................... 5.1 Sampling and Estimation in Non-stationary Settings....................... 5.1.1 The Exploration vs. Exploitation Tradeoff........................... 5.1.2 Non-stationarity and Exponential Smoothing....................... 5.2 Learning by Temporal Differences and SARSA................................ 5.3 g-Learning for Finite MDPs............................................................ 5.3.1 A Numerical Example........................................................... 5.4 For Further Reading........................................................................... References.................................................................................................. 141 143 144 147 149 151 156 159 160 6 Numerical Dynamic Programming for Continuous States................... 6.1 Solving Finite Horizon Problems by Standard Numerical Methods ............................................................................................ 6.2 A Numerical Approach to Consumption-Saving.............................. 6.2.1 Approximating the Optimal Policy by Numerical DP.......... 6.2.2 Optimizing a Fixed Policy..................................................... 6.2.3 Computational Experiments.................................................. 6.3 Computational Refinements and Extensions..................................... 6.4 For Further
Reading........................................................................... References.................................................................................................. 161 162 164 165 171 175 180 182 182 7 Approximate Dynamic Programming and Reinforcement Learning for Continuous States.............................................................. 7.1 Option Pricing by ADP and Linear Regression ................................ 7.2 A Basic Framework for ADP............................................................ 7.3 Least-Squares Policy Iteration........................................................... 7.4 For Further Reading........................................................................... References.................................................................................................. 185 186 193 197 203 204 Index 205
|
adam_txt |
Contents 1 2 The Dynamic Programming Principle. 1.1 What Is Dynamic Programming?. 1.2 Dynamic Decision Problems. 1.2.1 Finite Horizon, Discounted Problems . 1.2.2 Infinite Horizon, Discounted Problems. 1.2.3 Infinite Horizon, Average Contribution Per Stage Problems. 1.2.4 Problems with an Undefined Horizon . 1.2.5 Decision Policies. 1.3 An Example: Dynamic Single-Item Lot-Sizing. 1.4 A Glimpse of the DP Principle: The ShortestPath Problem. 1.4.1 Forward vs. Backward DP. 1.4.2 Shortest Paths on Structured Networks. 1.4.3 Stochastic Shortest Paths. 1.5 The DP Decomposition Principle. 1.5.1 Stochastic DP for Finite Time Horizons. 1.5.2 Stochastic DP for Infinite Time Horizons. 1.6 For Further Reading. References.
Implementing Dynamic Programming. 2.1 Discrete Resource Allocation: The Knapsack Problem. 2.2 Continuous Budget Allocation. 2.2.1 Interlude: Function Interpolation by Cubic Splines in MATLAB. 2.2.2 Solving the Continuous Budget Allocation Problem by Numerical DP. 2.3 Stochastic Inventory Control. 2.4 Exploiting Structure. 2.4.1 Using Shortest Paths to Solve the Deterministic Lot-Sizing Problem. 1 2 4 8 10 10 11 11 13 16 22 24 25 26 29 31 32 33 35 36 41 44 48 51 56 57 ix
x Contents 2.4.2 Stochastic Lot-Sizing: 5 and (í, 5) Policies. 2.4.3 Structural Properties of Value Functions. 2.5 The Curses of Dynamic Programming. 2.5.1 The Curse of State Dimensionality . 2.5.2 The Curse of Optimization. 2.5.3 The Curse of Expectation . 2.5.4 The Curse of Modeling. 2.6 For Further Reading. References. 60 63 64 64 65 65 65 66 66 3 Modeling for Dynamic Programming. 3.1 Finite Markov Decision Processes. 3.2 Different Shades of Stochastic DP. 3.2.1 Post-decision State Variables. 3.3 Variations on Inventory Management. 3.3.1 Deterministic Lead Time. 3.3.2 Perishable Items . 3.4 Revenue Management. 3.4.1 Static
Model with Perfect Demand Segmentation. 3.4.2 Dynamic Model with Perfect Demand Segmentation. 3.4.3 Dynamic Model with Customer Choice. 3.5 Pricing Financial Options with Early Exercise Features. 3.5.1 Bias Issues in Dynamic Programming. 3.6 Consumption-Saving with Uncertain Labor Income. 3.7 For Further Reading. References. 67 68 74 76 78 78 79 81 83 86 87 88 92 93 96 96 4 NumericalDynamic Programming for Discrete States. 4.1 Discrete-Time Markov Chains . 4.2 Markov Decision Processes with a Finite TimeHorizon. 4.2.1 A Numerical Example: Random Walks and Optimal Stopping. 4.3 Markov Decision Processes with an Infinite Time Horizon. 4.4 Value Iteration. 4.4.1 A Numerical Example of Value Iteration. 4.5 Policy Iteration. 4.5.1 A Numerical Example of Policy Iteration. 4.6 Value vs. Policy
Iteration. 4.7 Average Contribution Per Stage. 4.7.1 Relative Value Iteration for Problems Involving Average Contributions Per Stage. 4.7.2 Policy Iteration for Problems Involving Average Contributions Per Stage . 4.7.3 An Example of Application to Preventive Maintenance. 4.8 For Further Reading. References. 99 100 102 103 107 109 Ill 117 122 123 125 130 132 135 139 140
Contents ς Approximate Dynamic Programming and Reinforcement Learning for Discrete States. 5.1 Sampling and Estimation in Non-stationary Settings. 5.1.1 The Exploration vs. Exploitation Tradeoff. 5.1.2 Non-stationarity and Exponential Smoothing. 5.2 Learning by Temporal Differences and SARSA. 5.3 g-Learning for Finite MDPs. 5.3.1 A Numerical Example. 5.4 For Further Reading. References. 141 143 144 147 149 151 156 159 160 6 Numerical Dynamic Programming for Continuous States. 6.1 Solving Finite Horizon Problems by Standard Numerical Methods . 6.2 A Numerical Approach to Consumption-Saving. 6.2.1 Approximating the Optimal Policy by Numerical DP. 6.2.2 Optimizing a Fixed Policy. 6.2.3 Computational Experiments. 6.3 Computational Refinements and Extensions. 6.4 For Further
Reading. References. 161 162 164 165 171 175 180 182 182 7 Approximate Dynamic Programming and Reinforcement Learning for Continuous States. 7.1 Option Pricing by ADP and Linear Regression . 7.2 A Basic Framework for ADP. 7.3 Least-Squares Policy Iteration. 7.4 For Further Reading. References. 185 186 193 197 203 204 Index 205 |
any_adam_object | 1 |
any_adam_object_boolean | 1 |
author | Brandimarte, Paolo 1963- |
author_GND | (DE-588)1025449061 |
author_facet | Brandimarte, Paolo 1963- |
author_role | aut |
author_sort | Brandimarte, Paolo 1963- |
author_variant | p b pb |
building | Verbundindex |
bvnumber | BV047077795 |
classification_rvk | QH 423 ST 601 |
ctrlnum | (OCoLC)1235890058 (DE-599)BVBBV047077795 |
discipline | Informatik Wirtschaftswissenschaften |
discipline_str_mv | Informatik Wirtschaftswissenschaften |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01604nam a2200385 c 4500</leader><controlfield tag="001">BV047077795</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20220922 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">210105s2021 |||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">3030618668</subfield><subfield code="9">3030618668</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9783030618667</subfield><subfield code="c">hc</subfield><subfield code="9">978-3-030-61866-7</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9783030618698</subfield><subfield code="c">pbk</subfield><subfield code="9">978-3-030-61869-8</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1235890058</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV047077795</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-384</subfield><subfield code="a">DE-20</subfield><subfield code="a">DE-N2</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">QH 423</subfield><subfield code="0">(DE-625)141577:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 601</subfield><subfield code="0">(DE-625)143682:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Brandimarte, Paolo</subfield><subfield code="d">1963-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1025449061</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">From shortest paths to reinforcement learning</subfield><subfield code="b">a MATLAB-based tutorial on dynamic programming</subfield><subfield code="c">Paolo Brandimarte</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Cham</subfield><subfield code="b">Springer</subfield><subfield code="c">[2021]</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">© 2021</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xi, 207 Seiten</subfield><subfield code="b">Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">EURO Advanced Tutorials on Operational Research</subfield><subfield code="x">2364-687X</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Dynamische Optimierung</subfield><subfield code="0">(DE-588)4125677-3</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Dynamische Optimierung</subfield><subfield code="0">(DE-588)4125677-3</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">978-3-030-61867-4</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Augsburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032484700&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-032484700</subfield></datafield></record></collection> |
id | DE-604.BV047077795 |
illustrated | Not Illustrated |
index_date | 2024-07-03T16:15:23Z |
indexdate | 2024-07-10T09:01:57Z |
institution | BVB |
isbn | 3030618668 9783030618667 9783030618698 |
issn | 2364-687X |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-032484700 |
oclc_num | 1235890058 |
open_access_boolean | |
owner | DE-384 DE-20 DE-N2 |
owner_facet | DE-384 DE-20 DE-N2 |
physical | xi, 207 Seiten Diagramme |
publishDate | 2021 |
publishDateSearch | 2021 |
publishDateSort | 2021 |
publisher | Springer |
record_format | marc |
series2 | EURO Advanced Tutorials on Operational Research |
spelling | Brandimarte, Paolo 1963- Verfasser (DE-588)1025449061 aut From shortest paths to reinforcement learning a MATLAB-based tutorial on dynamic programming Paolo Brandimarte Cham Springer [2021] © 2021 xi, 207 Seiten Diagramme txt rdacontent n rdamedia nc rdacarrier EURO Advanced Tutorials on Operational Research 2364-687X Dynamische Optimierung (DE-588)4125677-3 gnd rswk-swf Dynamische Optimierung (DE-588)4125677-3 s DE-604 Erscheint auch als Online-Ausgabe 978-3-030-61867-4 Digitalisierung UB Augsburg - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032484700&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Brandimarte, Paolo 1963- From shortest paths to reinforcement learning a MATLAB-based tutorial on dynamic programming Dynamische Optimierung (DE-588)4125677-3 gnd |
subject_GND | (DE-588)4125677-3 |
title | From shortest paths to reinforcement learning a MATLAB-based tutorial on dynamic programming |
title_auth | From shortest paths to reinforcement learning a MATLAB-based tutorial on dynamic programming |
title_exact_search | From shortest paths to reinforcement learning a MATLAB-based tutorial on dynamic programming |
title_exact_search_txtP | From shortest paths to reinforcement learning a MATLAB-based tutorial on dynamic programming |
title_full | From shortest paths to reinforcement learning a MATLAB-based tutorial on dynamic programming Paolo Brandimarte |
title_fullStr | From shortest paths to reinforcement learning a MATLAB-based tutorial on dynamic programming Paolo Brandimarte |
title_full_unstemmed | From shortest paths to reinforcement learning a MATLAB-based tutorial on dynamic programming Paolo Brandimarte |
title_short | From shortest paths to reinforcement learning |
title_sort | from shortest paths to reinforcement learning a matlab based tutorial on dynamic programming |
title_sub | a MATLAB-based tutorial on dynamic programming |
topic | Dynamische Optimierung (DE-588)4125677-3 gnd |
topic_facet | Dynamische Optimierung |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032484700&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT brandimartepaolo fromshortestpathstoreinforcementlearningamatlabbasedtutorialondynamicprogramming |