Simulation-based algorithms for Markov decision processes:
Gespeichert in:
Format: | Buch |
---|---|
Sprache: | English |
Veröffentlicht: |
London [u.a.]
Springer
2013
|
Ausgabe: | 2. ed. |
Schriftenreihe: | Communications and control engineering
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | XVII, 229 S. graph. Darst. 235 mm x 155 mm |
ISBN: | 9781447150213 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV041269770 | ||
003 | DE-604 | ||
005 | 20141202 | ||
007 | t | ||
008 | 130913s2013 d||| |||| 00||| eng d | ||
020 | |a 9781447150213 |9 978-1-4471-5021-3 | ||
035 | |a (OCoLC)862801521 | ||
035 | |a (DE-599)BVBBV041269770 | ||
040 | |a DE-604 |b ger |e rakddb | ||
041 | 0 | |a eng | |
049 | |a DE-355 |a DE-384 | ||
050 | 0 | |a HD30.23 | |
082 | 0 | |a 658.4033 |2 22 | |
084 | |a QH 233 |0 (DE-625)141548: |2 rvk | ||
084 | |a SK 820 |0 (DE-625)143258: |2 rvk | ||
084 | |a 510 |2 sdnb | ||
245 | 1 | 0 | |a Simulation-based algorithms for Markov decision processes |c Hyeong Soo Chang ... |
250 | |a 2. ed. | ||
264 | 1 | |a London [u.a.] |b Springer |c 2013 | |
300 | |a XVII, 229 S. |b graph. Darst. |c 235 mm x 155 mm | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a Communications and control engineering | |
650 | 7 | |a Algoritmen |2 gtt | |
650 | 7 | |a Markov-beslissingsproblemen |2 gtt | |
650 | 4 | |a Prise de décision - Modèles mathématiques | |
650 | 4 | |a Processus de Markov | |
650 | 7 | |a Simulatiemodellen |2 gtt | |
650 | 4 | |a Mathematisches Modell | |
650 | 4 | |a Decision making |x Mathematical models | |
650 | 4 | |a Markov processes | |
650 | 0 | 7 | |a Markov-Entscheidungsprozess |0 (DE-588)4168927-6 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Evolutionärer Algorithmus |0 (DE-588)4366912-8 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Markov-Entscheidungsprozess |0 (DE-588)4168927-6 |D s |
689 | 0 | 1 | |a Evolutionärer Algorithmus |0 (DE-588)4366912-8 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Chang, Hyeong Soo |e Sonstige |4 oth | |
856 | 4 | 2 | |m Digitalisierung UB Regensburg - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=026243402&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-026243402 |
Datensatz im Suchindex
_version_ | 1804150735006334976 |
---|---|
adam_text | Contents
1
Markov Decision Processes
....................... 1
1.1
Optimality Equations
........................ 3
1.2
Policy Iteration and Value Iteration
................. 5
1.3
Rolling-Horizon Control
....................... 7
1.4
Survey of Previous Work on Computational Methods
....... 8
1.5
Simulation
.............................. 10
1.6
Preview of Coming Attractions
................... 13
1.7
Notes
................................. 14
2
Multi-stage Adaptive Sampling Algorithms
.............. 19
2.1
Upper Confidence Bound Sampling
................. 21
2.1.1
Regret Analysis in Multi-armed Bandits
.......... 21
2.12
Algorithm Description
.................... 22
2.1.3
Alternative Estimators
.................... 25
2.1.4
Convergence Analysis
.................... 25
2.L5 Numerical Example
..................... 33
2.2
Pursuit Learning Automata Sampling
................ 37
2.2.1
Algorithm Description
.................... 42
2.2.2
Convergence Analysis
.................... 44
2.2.3
Application to POMDPs
................... 52
2.2.4
Numerical Example
..................... 54
2.3
Notes
................................. 57
3
Population-Based Evolutionary Approaches
.............. 61
3.1
Evolutionary Policy iteration
.................... 63
3.
3.
3.
3.
3.
. 1
Policy Switching
....................... 63
.2
Policy Mutation and Population Generation
......... 65
.3
Stopping Rule
........................ 65
.4
Convergence Analysis
.................... 66
.5
Parallelization
........................ 67
3.2
Evolutionary Random Policy Search
................ 67
xii Contents
3.2.1
Policy Improvement with Reward Swapping
........ 68
3.2.2
Exploration
......................... 71
3.2.3
Convergence Analysis
.................... 73
3.3
Numerical Examples
......................... 76
3.3.1
A One-Dimensional Qu
eu
ein
g
Example
.......... 76
3.3.2
A Two-Dimensional Queueing Example
.......... 83
3.4
Extension to Simulation-Based Setting
............... 86
3.5
Notes
................................. 87
4
Model Reference Adaptive Search
................... 89
4.1
The Model Reference Adaptive Search Method
........... 91
4.1.1
The MRASo Algorithm (Idealized Version)
......... 92
4.1.2
The MRASi Algorithm (Adaptive Monte Carlo Version)
. . 96
4.1.3
The MRAS2 Algorithm (Stochastic Optimization)
..... 98
4.2
Convergence Analysis of MRAS
.................. 101
4.2.1
MRASo Convergence
.................... 101
4.2.2
MRAS! Convergence
.................... 107
4.2.3
MRAS2 Convergence
.................... 117
4.3
Application of MRAS to MDPs via Direct Policy Learning
.... 131
4.3.1
Finite-Horizon MDPs
.................... 131
4.3.2
Infinite-Horizon MDPs
................... 132
4.3.3
MDPs with Large State Spaces
............... 132
4.3.4
Numerical Examples
..................... 135
4.4
Application of MRAS to Infinite-Horizon MDPs in Population-
Based Evolutionary Approaches
................... 141
4.4.1
Algorithm Description
.................... 142
4.4.2
Numerical Examples
..................... 143
4.5
Application of MRAS to Finite-Horizon MDPs Using Adaptive
Sampling
............................... 144
4.6
A Stochastic Approximation Framework
............... 148
4.6.1
Model-Based Annealing Random Search
.......... 149
4.6.2
Application of MARS to Finite Horizon MDPs
....... 166
4.7
Notes
................................. 177
5
On-Line Control Methods via Simulation
............... 179
5.1
Simulated Annealing Multiplicative Weights Algorithm
...... 183
5.1.1
Basic Algorithm Description
................ 184
5.1.2
Convergence Analysis
.................... 185
5.1.3
Convergence of the Sampling Version of the Algorithm
. . 189
5.1.4
Numerical Example
..................... 191
5.1.5
Simulated Policy Switching
................. 194
52
Rollout
................................ 195
5.2.1
Parallel Rollout
....................... 197
5.3
Hindsight Optimization
....................... 199
5.3.1
Numerical Example
..................... 200
5.4
Approximate Stochastic Annealing
................. 204
Contents xiii
5.4.1
Convergence
Analysis
....................207
5.4.2
Numerical Example
.....................215
5.5
Notes
.................................216
References
...................................219
Index
......................................227
|
any_adam_object | 1 |
building | Verbundindex |
bvnumber | BV041269770 |
callnumber-first | H - Social Science |
callnumber-label | HD30 |
callnumber-raw | HD30.23 |
callnumber-search | HD30.23 |
callnumber-sort | HD 230.23 |
callnumber-subject | HD - Industries, Land Use, Labor |
classification_rvk | QH 233 SK 820 |
ctrlnum | (OCoLC)862801521 (DE-599)BVBBV041269770 |
dewey-full | 658.4033 |
dewey-hundreds | 600 - Technology (Applied sciences) |
dewey-ones | 658 - General management |
dewey-raw | 658.4033 |
dewey-search | 658.4033 |
dewey-sort | 3658.4033 |
dewey-tens | 650 - Management and auxiliary services |
discipline | Mathematik Wirtschaftswissenschaften |
edition | 2. ed. |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01950nam a2200505 c 4500</leader><controlfield tag="001">BV041269770</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20141202 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">130913s2013 d||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781447150213</subfield><subfield code="9">978-1-4471-5021-3</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)862801521</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV041269770</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakddb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-355</subfield><subfield code="a">DE-384</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">HD30.23</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">658.4033</subfield><subfield code="2">22</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">QH 233</subfield><subfield code="0">(DE-625)141548:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">SK 820</subfield><subfield code="0">(DE-625)143258:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">510</subfield><subfield code="2">sdnb</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Simulation-based algorithms for Markov decision processes</subfield><subfield code="c">Hyeong Soo Chang ...</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">2. ed.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">London [u.a.]</subfield><subfield code="b">Springer</subfield><subfield code="c">2013</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XVII, 229 S.</subfield><subfield code="b">graph. Darst.</subfield><subfield code="c">235 mm x 155 mm</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Communications and control engineering</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Algoritmen</subfield><subfield code="2">gtt</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Markov-beslissingsproblemen</subfield><subfield code="2">gtt</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Prise de décision - Modèles mathématiques</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Processus de Markov</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Simulatiemodellen</subfield><subfield code="2">gtt</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Mathematisches Modell</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Decision making</subfield><subfield code="x">Mathematical models</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Markov processes</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Markov-Entscheidungsprozess</subfield><subfield code="0">(DE-588)4168927-6</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Evolutionärer Algorithmus</subfield><subfield code="0">(DE-588)4366912-8</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Markov-Entscheidungsprozess</subfield><subfield code="0">(DE-588)4168927-6</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Evolutionärer Algorithmus</subfield><subfield code="0">(DE-588)4366912-8</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Chang, Hyeong Soo</subfield><subfield code="e">Sonstige</subfield><subfield code="4">oth</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Regensburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=026243402&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-026243402</subfield></datafield></record></collection> |
id | DE-604.BV041269770 |
illustrated | Illustrated |
indexdate | 2024-07-10T00:43:38Z |
institution | BVB |
isbn | 9781447150213 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-026243402 |
oclc_num | 862801521 |
open_access_boolean | |
owner | DE-355 DE-BY-UBR DE-384 |
owner_facet | DE-355 DE-BY-UBR DE-384 |
physical | XVII, 229 S. graph. Darst. 235 mm x 155 mm |
publishDate | 2013 |
publishDateSearch | 2013 |
publishDateSort | 2013 |
publisher | Springer |
record_format | marc |
series2 | Communications and control engineering |
spelling | Simulation-based algorithms for Markov decision processes Hyeong Soo Chang ... 2. ed. London [u.a.] Springer 2013 XVII, 229 S. graph. Darst. 235 mm x 155 mm txt rdacontent n rdamedia nc rdacarrier Communications and control engineering Algoritmen gtt Markov-beslissingsproblemen gtt Prise de décision - Modèles mathématiques Processus de Markov Simulatiemodellen gtt Mathematisches Modell Decision making Mathematical models Markov processes Markov-Entscheidungsprozess (DE-588)4168927-6 gnd rswk-swf Evolutionärer Algorithmus (DE-588)4366912-8 gnd rswk-swf Markov-Entscheidungsprozess (DE-588)4168927-6 s Evolutionärer Algorithmus (DE-588)4366912-8 s DE-604 Chang, Hyeong Soo Sonstige oth Digitalisierung UB Regensburg - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=026243402&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Simulation-based algorithms for Markov decision processes Algoritmen gtt Markov-beslissingsproblemen gtt Prise de décision - Modèles mathématiques Processus de Markov Simulatiemodellen gtt Mathematisches Modell Decision making Mathematical models Markov processes Markov-Entscheidungsprozess (DE-588)4168927-6 gnd Evolutionärer Algorithmus (DE-588)4366912-8 gnd |
subject_GND | (DE-588)4168927-6 (DE-588)4366912-8 |
title | Simulation-based algorithms for Markov decision processes |
title_auth | Simulation-based algorithms for Markov decision processes |
title_exact_search | Simulation-based algorithms for Markov decision processes |
title_full | Simulation-based algorithms for Markov decision processes Hyeong Soo Chang ... |
title_fullStr | Simulation-based algorithms for Markov decision processes Hyeong Soo Chang ... |
title_full_unstemmed | Simulation-based algorithms for Markov decision processes Hyeong Soo Chang ... |
title_short | Simulation-based algorithms for Markov decision processes |
title_sort | simulation based algorithms for markov decision processes |
topic | Algoritmen gtt Markov-beslissingsproblemen gtt Prise de décision - Modèles mathématiques Processus de Markov Simulatiemodellen gtt Mathematisches Modell Decision making Mathematical models Markov processes Markov-Entscheidungsprozess (DE-588)4168927-6 gnd Evolutionärer Algorithmus (DE-588)4366912-8 gnd |
topic_facet | Algoritmen Markov-beslissingsproblemen Prise de décision - Modèles mathématiques Processus de Markov Simulatiemodellen Mathematisches Modell Decision making Mathematical models Markov processes Markov-Entscheidungsprozess Evolutionärer Algorithmus |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=026243402&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT changhyeongsoo simulationbasedalgorithmsformarkovdecisionprocesses |