Algorithms for reinforcement learning:
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
[San Rafael, California]
Morgan & Claypool Publishers
[2010]
|
Schriftenreihe: | Synthesis lectures on artificial intelligence and machine learning
9 |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis Klappentext |
Beschreibung: | Literaturverzeichnis Seite 73-88 |
Beschreibung: | xiii, 89 Seiten Illustrationen |
ISBN: | 9781608454921 |
Internformat
MARC
LEADER | 00000nam a2200000 cb4500 | ||
---|---|---|---|
001 | BV042728899 | ||
003 | DE-604 | ||
005 | 20230131 | ||
007 | t | ||
008 | 150731s2010 a||| |||| 00||| eng d | ||
020 | |a 9781608454921 |c Pb.: £ 19.99 |9 978-1-60845-492-1 | ||
024 | 3 | |a 9781608454921 | |
035 | |a (OCoLC)732200221 | ||
035 | |a (DE-599)BSZ330543571 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-706 |a DE-523 |a DE-91 |a DE-703 |a DE-634 |a DE-355 | ||
082 | 0 | |a 006.31 | |
084 | |a ST 300 |0 (DE-625)143650: |2 rvk | ||
084 | |a ST 304 |0 (DE-625)143653: |2 rvk | ||
084 | |a 68W05 |2 msc | ||
084 | |a 90C40 |2 msc | ||
084 | |a 90C39 |2 msc | ||
084 | |a 68T05 |2 msc | ||
084 | |a 68-02 |2 msc | ||
084 | |a DAT 708f |2 stub | ||
100 | 1 | |a Szepesvári, Csaba |d 1969- |e Verfasser |0 (DE-588)1082754714 |4 aut | |
245 | 1 | 0 | |a Algorithms for reinforcement learning |c Csaba Szepesvári, University of Alberta |
264 | 1 | |a [San Rafael, California] |b Morgan & Claypool Publishers |c [2010] | |
300 | |a xiii, 89 Seiten |b Illustrationen | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 1 | |a Synthesis lectures on artificial intelligence and machine learning |v 9 | |
500 | |a Literaturverzeichnis Seite 73-88 | ||
650 | 4 | |a Reinforcement learning | |
650 | 4 | |a Algorithms | |
650 | 0 | 7 | |a Bestärkendes Lernen |g Künstliche Intelligenz |0 (DE-588)4825546-4 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Bestärkendes Lernen |g Künstliche Intelligenz |0 (DE-588)4825546-4 |D s |
689 | 0 | |5 DE-604 | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |z 978-1-60845-493-8 |
830 | 0 | |a Synthesis lectures on artificial intelligence and machine learning |v 9 |w (DE-604)BV035750800 |9 9 | |
856 | 4 | 2 | |m Digitalisierung UB Regensburg - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=028159951&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
856 | 4 | 2 | |m Digitalisierung UB Regensburg - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=028159951&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |3 Klappentext |
999 | |a oai:aleph.bib-bvb.de:BVB01-028159951 |
Datensatz im Suchindex
_version_ | 1804174939464400896 |
---|---|
adam_text | vii Contents Preface..........................................................................................................................ix Acknowledgments.................................................................................................... xiii 1 2 Markov Decision Processes.......................................................................................... 1 1.1 Preliminaries.................................................................................................................... 1 1.2 Markov Decision Processes............................................................................................ 1 1.3 Value functions................................................................................................................ 6 1.4 Dynamic programming algorithms for solving MDPs.............................................. 10 Value Prediction Problems..........................................................................................11 2.1 Temporal difference learning in finite statespaces...................................................... 11 2.1.1 TabularTD(O)..................................................................................................11 2.1.2 Every-visit Monte-Carlo................................................................................. 14 2.1.3 TD(À): Unifying Monte-Carlo andTD(O)................................................... 16 2.2 Algorithms for large state spaces................................................................................. 18 2.2.1 TD(À) with function
approximation............................................................... 22 2.2.2 Gradient temporal difference learning............................................................ 25 2.2.3 Least-squares methods..................................................................................... 27 2.2.4 The choice of the function space.................................................................... 33 3 Control........................................................................................................................ 37 3.1 A catalog of learning problems.................................................................................... 37 3.2 Closed-loop interactive learning................................................................................. 38 3.2.1 Online learning in bandits................................................................................38 3.2.2 Active learning in bandits................................................................................ 40 3.2.3 Active learning in Markov Decision Processes............................................... 41
CONTENTS viii 3.2.4 Online learning in Markov Decision Processes............................................. 42 3.3 Direct methods.............................................................................................................. 47 3.3.1 Q ֊learning in finite MDPs.............................................................................. 47 3.3.2 2-learning with functionapproximation......................................................... 49 3.4 Actor-critic methods..................................................................................................... 52 3.4.1 Implementing a critic....................................................................................... 54 3.4.2 Implementing an actor..................................................................................... 56 4 A For further exploration........................................................................................................ 63 4.1 Further reading.............................................................................................................. 63 4.2 Applications.................................................................................................................. 63 4.3 Software......................................................................................................................... 64 The theory of discounted Markovian decision processes..............................................65 A.l Contractions andBanach’s fixed-point theorem.......................................................... 65 A.2 Application to
MDPs................................................................................................... 69 Bibliography.......................................................................................................................... 73 Author’s Biography...............................................................................................................89
Algorithms for Reinforcement Learning. Csaba Szepesvári, University of Alberta Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner’s predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms’ merits and limitations. Reinforcement learning is of great interest because of the large number of practical applications that it can be used to address, ranging from problems in artificial intelligence to operations research or control engineering. In this book, we focus on those algorithms of reinforcement learning that build on the powerful problems, theory of dynamic programming. We give awith fairly comprehensive catalog of learning describe the core ideas together a large number of state of the art algorithms, followed by the discussion of their theoretical properties and limitations.
|
any_adam_object | 1 |
author | Szepesvári, Csaba 1969- |
author_GND | (DE-588)1082754714 |
author_facet | Szepesvári, Csaba 1969- |
author_role | aut |
author_sort | Szepesvári, Csaba 1969- |
author_variant | c s cs |
building | Verbundindex |
bvnumber | BV042728899 |
classification_rvk | ST 300 ST 304 |
classification_tum | DAT 708f |
ctrlnum | (OCoLC)732200221 (DE-599)BSZ330543571 |
dewey-full | 006.31 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 006 - Special computer methods |
dewey-raw | 006.31 |
dewey-search | 006.31 |
dewey-sort | 16.31 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02313nam a2200505 cb4500</leader><controlfield tag="001">BV042728899</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20230131 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">150731s2010 a||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781608454921</subfield><subfield code="c">Pb.: £ 19.99</subfield><subfield code="9">978-1-60845-492-1</subfield></datafield><datafield tag="024" ind1="3" ind2=" "><subfield code="a">9781608454921</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)732200221</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BSZ330543571</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-706</subfield><subfield code="a">DE-523</subfield><subfield code="a">DE-91</subfield><subfield code="a">DE-703</subfield><subfield code="a">DE-634</subfield><subfield code="a">DE-355</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.31</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 300</subfield><subfield code="0">(DE-625)143650:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 304</subfield><subfield code="0">(DE-625)143653:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">68W05</subfield><subfield code="2">msc</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">90C40</subfield><subfield code="2">msc</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">90C39</subfield><subfield code="2">msc</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">68T05</subfield><subfield code="2">msc</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">68-02</subfield><subfield code="2">msc</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DAT 708f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Szepesvári, Csaba</subfield><subfield code="d">1969-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1082754714</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Algorithms for reinforcement learning</subfield><subfield code="c">Csaba Szepesvári, University of Alberta</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">[San Rafael, California]</subfield><subfield code="b">Morgan & Claypool Publishers</subfield><subfield code="c">[2010]</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xiii, 89 Seiten</subfield><subfield code="b">Illustrationen</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Synthesis lectures on artificial intelligence and machine learning</subfield><subfield code="v">9</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Literaturverzeichnis Seite 73-88</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Reinforcement learning</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Algorithms</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Bestärkendes Lernen</subfield><subfield code="g">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4825546-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Bestärkendes Lernen</subfield><subfield code="g">Künstliche Intelligenz</subfield><subfield code="0">(DE-588)4825546-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">978-1-60845-493-8</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">Synthesis lectures on artificial intelligence and machine learning</subfield><subfield code="v">9</subfield><subfield code="w">(DE-604)BV035750800</subfield><subfield code="9">9</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Regensburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=028159951&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Regensburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=028159951&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Klappentext</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-028159951</subfield></datafield></record></collection> |
id | DE-604.BV042728899 |
illustrated | Illustrated |
indexdate | 2024-07-10T07:08:22Z |
institution | BVB |
isbn | 9781608454921 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-028159951 |
oclc_num | 732200221 |
open_access_boolean | |
owner | DE-706 DE-523 DE-91 DE-BY-TUM DE-703 DE-634 DE-355 DE-BY-UBR |
owner_facet | DE-706 DE-523 DE-91 DE-BY-TUM DE-703 DE-634 DE-355 DE-BY-UBR |
physical | xiii, 89 Seiten Illustrationen |
publishDate | 2010 |
publishDateSearch | 2010 |
publishDateSort | 2010 |
publisher | Morgan & Claypool Publishers |
record_format | marc |
series | Synthesis lectures on artificial intelligence and machine learning |
series2 | Synthesis lectures on artificial intelligence and machine learning |
spelling | Szepesvári, Csaba 1969- Verfasser (DE-588)1082754714 aut Algorithms for reinforcement learning Csaba Szepesvári, University of Alberta [San Rafael, California] Morgan & Claypool Publishers [2010] xiii, 89 Seiten Illustrationen txt rdacontent n rdamedia nc rdacarrier Synthesis lectures on artificial intelligence and machine learning 9 Literaturverzeichnis Seite 73-88 Reinforcement learning Algorithms Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 gnd rswk-swf Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 s DE-604 Erscheint auch als Online-Ausgabe 978-1-60845-493-8 Synthesis lectures on artificial intelligence and machine learning 9 (DE-604)BV035750800 9 Digitalisierung UB Regensburg - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=028159951&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis Digitalisierung UB Regensburg - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=028159951&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA Klappentext |
spellingShingle | Szepesvári, Csaba 1969- Algorithms for reinforcement learning Synthesis lectures on artificial intelligence and machine learning Reinforcement learning Algorithms Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 gnd |
subject_GND | (DE-588)4825546-4 |
title | Algorithms for reinforcement learning |
title_auth | Algorithms for reinforcement learning |
title_exact_search | Algorithms for reinforcement learning |
title_full | Algorithms for reinforcement learning Csaba Szepesvári, University of Alberta |
title_fullStr | Algorithms for reinforcement learning Csaba Szepesvári, University of Alberta |
title_full_unstemmed | Algorithms for reinforcement learning Csaba Szepesvári, University of Alberta |
title_short | Algorithms for reinforcement learning |
title_sort | algorithms for reinforcement learning |
topic | Reinforcement learning Algorithms Bestärkendes Lernen Künstliche Intelligenz (DE-588)4825546-4 gnd |
topic_facet | Reinforcement learning Algorithms Bestärkendes Lernen Künstliche Intelligenz |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=028159951&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=028159951&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |
volume_link | (DE-604)BV035750800 |
work_keys_str_mv | AT szepesvaricsaba algorithmsforreinforcementlearning |