Foundations of Reinforcement Learning with Applications in Finance:
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Elektronisch E-Book |
Sprache: | English |
Veröffentlicht: |
Milton
CRC Press LLC
2022
|
Schlagworte: | |
Online-Zugang: | HWR01 |
Beschreibung: | 1 Online-Ressource (522 Seiten) |
ISBN: | 9781000801101 9781032124124 |
Internformat
MARC
LEADER | 00000nmm a2200000 c 4500 | ||
---|---|---|---|
001 | BV048632482 | ||
003 | DE-604 | ||
005 | 00000000000000.0 | ||
007 | cr|uuu---uuuuu | ||
008 | 230105s2022 |||| o||u| ||||||eng d | ||
020 | |a 9781000801101 |q electronic bk. |9 978-1-00-080110-1 | ||
020 | |a 9781032124124 |9 978-1-03-212412-4 | ||
035 | |a (ZDB-30-PQE)EBC7121499 | ||
035 | |a (ZDB-30-PAD)EBC7121499 | ||
035 | |a (ZDB-89-EBL)EBL7121499 | ||
035 | |a (OCoLC)1349282032 | ||
035 | |a (DE-599)BVBBV048632482 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-2070s | ||
100 | 1 | |a Rao, Ashwin |e Verfasser |4 aut | |
245 | 1 | 0 | |a Foundations of Reinforcement Learning with Applications in Finance |
264 | 1 | |a Milton |b CRC Press LLC |c 2022 | |
264 | 4 | |c ©2023 | |
300 | |a 1 Online-Ressource (522 Seiten) | ||
336 | |b txt |2 rdacontent | ||
337 | |b c |2 rdamedia | ||
338 | |b cr |2 rdacarrier | ||
505 | 8 | |a Cover -- Half Title -- Title Page -- Copyright Page -- Contents -- Preface -- Author Biographies -- Summary of Notation -- CHAPTER 1: Overview -- 1.1. LEARNING REINFORCEMENT LEARNING -- 1.2. WHAT YOU WILL LEARN FROM THIS BOOK -- 1.3. EXPECTED BACKGROUND TO READ THIS BOOK -- 1.4. DECLUTTERING THE JARGON LINKED TO REINFORCEMENT LEARNING -- 1.5. INTRODUCTION TO THE MARKOV DECISION PROCESS (MDP) FRAMEWORK -- 1.6. REAL-WORLD PROBLEMS THAT FIT THE MDP FRAMEWORK -- 1.7. THE INHERENT DIFFICULTY IN SOLVING MDPS -- 1.8. VALUE FUNCTION, BELLMAN EQUATIONS, DYNAMIC PROGRAMMING AND RL -- 1.9. OUTLINE OF CHAPTERS -- 1.9.1. Module I: Processes and Planning Algorithms -- 1.9.2. Module II: Modeling Financial Applications -- 1.9.3. Module III: Reinforcement Learning Algorithms -- 1.9.4. Module IV: Finishing Touches -- 1.9.5. Short Appendix Chapters -- CHAPTER 2: Programming and Design -- 2.1. CODE DESIGN -- 2.2. ENVIRONMENT SETUP -- 2.3. CLASSES AND INTERFACES -- 2.3.1. A Distribution Interface -- 2.3.2. A Concrete Distribution -- 2.3.2.1. Dataclasses -- 2.3.2.2. Immutability -- 2.3.3. Checking Types -- 2.3.3.1. Static Typing -- 2.3.4. Type Variables -- 2.3.5. Functionality -- 2.4. ABSTRACTING OVER COMPUTATION -- 2.4.1. First-Class Functions -- 2.4.1.1. Lambdas -- 2.4.2. Iterative Algorithms -- 2.4.2.1. Iterators and Generators -- 2.5. KEY TAKEAWAYS FROM THIS CHAPTER -- MODULE I: Processes and Planning Algorithms -- Chapter 3: Markov Processes -- 3.1. THE CONCEPT OF STATE IN A PROCESS -- 3.2. UNDERSTANDING MARKOV PROPERTY FROM STOCK PRICE EXAMPLES -- 3.3. FORMAL DEFINITIONS FOR MARKOV PROCESSES -- 3.3.1. Starting States -- 3.3.2. Terminal States -- 3.3.3. Markov Process Implementation -- 3.4. STOCK PRICE EXAMPLES MODELED AS MARKOV PROCESSES -- 3.5. FINITE MARKOV PROCESSES -- 3.6. SIMPLE INVENTORY EXAMPLE -- 3.7. STATIONARY DISTRIBUTION OF A MARKOV PROCESS. | |
505 | 8 | |a 3.8. FORMALISM OF MARKOV REWARD PROCESSES -- 3.9. SIMPLE INVENTORY EXAMPLE AS A MARKOV REWARD PROCESS -- 3.10. FINITE MARKOV REWARD PROCESSES -- 3.11. SIMPLE INVENTORY EXAMPLE AS A FINITE MARKOV REWARD PROCESS -- 3.12. VALUE FUNCTION OF A MARKOV REWARD PROCESS -- 3.13. SUMMARY OF KEY LEARNINGS FROM THIS CHAPTER -- Chapter 4: Markov Decision Processes -- 4.1. SIMPLE INVENTORY EXAMPLE: HOW MUCH TO ORDER? -- 4.2. THE DIFFICULTY OF SEQUENTIAL DECISIONING UNDER UNCERTAINTY -- 4.3. FORMAL DEFINITION OF A MARKOV DECISION PROCESS -- 4.4. POLICY -- 4.5. [MARKOV DECISION PROCESS, POLICY] := MARKOV REWARD PROCESS -- 4.6. SIMPLE INVENTORY EXAMPLE WITH UNLIMITED CAPACITY (INFINITE STATE/ACTION SPACE) -- 4.7. FINITE MARKOV DECISION PROCESSES -- 4.8. SIMPLE INVENTORY EXAMPLE AS A FINITE MARKOV DECISION PROCESS -- 4.9. MDP VALUE FUNCTION FOR A FIXED POLICY -- 4.10. OPTIMAL VALUE FUNCTION AND OPTIMAL POLICIES -- 4.11. VARIANTS AND EXTENSIONS OF MDPS -- 4.11.1. Size of Spaces and Discrete versus Continuous -- 4.11.1.1. State Space -- 4.11.1.2. Action Space -- 4.11.1.3. Time Steps -- 4.11.2. Partially-Observable Markov Decision Processes (POMDPs) -- 4.12. SUMMARY OF KEY LEARNINGS FROM THIS CHAPTER -- Chapter 5: Dynamic Programming Algorithms -- 5.1. PLANNING VERSUS LEARNING -- 5.2. USAGE OF THE TERM DYNAMIC PROGRAMMING -- 5.3. FIXED-POINT THEORY -- 5.4. BELLMAN POLICY OPERATOR AND POLICY EVALUATION ALGORITHM -- 5.5. GREEDY POLICY -- 5.6. POLICY IMPROVEMENT -- 5.7. POLICY ITERATION ALGORITHM -- 5.8. BELLMAN OPTIMALITY OPERATOR AND VALUE ITERATION ALGORITHM -- 5.9. OPTIMAL POLICY FROM OPTIMAL VALUE FUNCTION -- 5.10. REVISITING THE SIMPLE INVENTORY EXAMPLE -- 5.11. GENERALIZED POLICY ITERATION -- 5.12. ASYNCHRONOUS DYNAMIC PROGRAMMING -- 5.13. FINITE-HORIZON DYNAMIC PROGRAMMING: BACKWARD INDUCTION -- 5.14. DYNAMIC PRICING FOR END-OF-LIFE/END-OF-SEASON OF A PRODUCT. | |
505 | 8 | |a 5.15. GENERALIZATION TO NON-TABULAR ALGORITHMS -- 5.16. SUMMARY OF KEY LEARNINGS FROM THIS CHAPTER -- Chapter 6: Function Approximation and Approximate Dynamic Programming -- 6.1. FUNCTION APPROXIMATION -- 6.2. LINEAR FUNCTION APPROXIMATION -- 6.3. NEURAL NETWORK FUNCTION APPROXIMATION -- 6.4. TABULAR AS A FORM OF FUNCTIONAPPROX -- 6.5. APPROXIMATE POLICY EVALUATION -- 6.6. APPROXIMATE VALUE ITERATION -- 6.7. FINITE-HORIZON APPROXIMATE POLICY EVALUATION -- 6.8. FINITE-HORIZON APPROXIMATE VALUE ITERATION -- 6.9. FINITE-HORIZON APPROXIMATE Q-VALUE ITERATION -- 6.10. HOW TO CONSTRUCT THE NON-TERMINAL STATES DISTRIBUTION -- 6.11. KEY TAKEAWAYS FROM THIS CHAPTER -- MODULE II: Modeling Financial Applications -- Chapter 7: Utility Theory -- 7.1. INTRODUCTION TO THE CONCEPT OF UTILITY -- 7.2. A SIMPLE FINANCIAL EXAMPLE -- 7.3. THE SHAPE OF THE UTILITY FUNCTION -- 7.4. CALCULATING THE RISK-PREMIUM -- 7.5. CONSTANT ABSOLUTE RISK-AVERSION (CARA) -- 7.6. A PORTFOLIO APPLICATION OF CARA -- 7.7. CONSTANT RELATIVE RISK-AVERSION (CRRA) -- 7.8. A PORTFOLIO APPLICATION OF CRRA -- 7.9. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 8: Dynamic Asset-Allocation and Consumption -- 8.1. OPTIMIZATION OF PERSONAL FINANCE -- 8.2. MERTON'S PORTFOLIO PROBLEM AND SOLUTION -- 8.3. DEVELOPING INTUITION FOR THE SOLUTION TO MERTON'S PORTFOLIO PROBLEM -- 8.4. A DISCRETE-TIME ASSET-ALLOCATION EXAMPLE -- 8.5. PORTING TO REAL-WORLD -- 8.6. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 9: Derivatives Pricing and Hedging -- 9.1. A BRIEF INTRODUCTION TO DERIVATIVES -- 9.1.1. Forwards -- 9.1.2. European Options -- 9.1.3. American Options -- 9.2. NOTATION FOR THE SINGLE-PERIOD SIMPLE SETTING -- 9.3. PORTFOLIOS, ARBITRAGE AND RISK-NEUTRAL PROBABILITY MEASURE -- 9.4. FIRST FUNDAMENTAL THEOREM OF ASSET PRICING (1ST FTAP) -- 9.5. SECOND FUNDAMENTAL THEOREM OF ASSET PRICING (2ND FTAP) | |
505 | 8 | |a 9.6. DERIVATIVES PRICING IN SINGLE-PERIOD SETTING -- 9.6.1. Derivatives Pricing When Market Is Complete -- 9.6.2. Derivatives Pricing When Market Is Incomplete -- 9.6.3. Derivatives Pricing When Market Has Arbitrage -- 9.7. DERIVATIVES PRICING IN MULTI-PERIOD/CONTINUOUS-TIME -- 9.7.1. Multi-Period Complete-Market Setting -- 9.7.2. Continuous-Time Complete-Market Setting -- 9.8. OPTIMAL EXERCISE OF AMERICAN OPTIONS CAST AS A FINITE MDP -- 9.9. GENERALIZING TO OPTIMAL-STOPPING PROBLEMS -- 9.10. PRICING/HEDGING IN AN INCOMPLETE MARKET CAST AS AN MDP -- 9.11. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 10: Order-Book Trading Algorithms -- 10.1. BASICS OF ORDER BOOK AND PRICE IMPACT -- 10.2. OPTIMAL EXECUTION OF A MARKET ORDER -- 10.2.1. Simple Linear Price Impact Model with No Risk-Aversion -- 10.2.2. Paper by Bertsimas and Lo on Optimal Order Execution -- 10.2.3. Incorporating Risk-Aversion and Real-World Considerations -- 10.3. OPTIMAL MARKET-MAKING -- 10.3.1. Avellaneda-Stoikov Continuous-Time Formulation -- 10.3.2. Solving the Avellaneda-Stoikov Formulation -- 10.3.3. Analytical Approximation to the Solution to Avellaneda-Stoikov Formulation -- 10.3.4. Real-World Market-Making -- 10.4. KEY TAKEAWAYS FROM THIS CHAPTER -- MODULE III: Reinforcement Learning Algorithms -- Chapter 11: Monte-Carlo and Temporal-Difference for Prediction -- 11.1. OVERVIEW OF THE REINFORCEMENT LEARNING APPROACH -- 11.2. RL FOR PREDICTION -- 11.3. MONTE-CARLO (MC) PREDICTION -- 11.4. TEMPORAL-DIFFERENCE (TD) PREDICTION -- 11.5. TD VERSUS MC -- 11.5.1. TD Learning Akin to Human Learning -- 11.5.2. Bias, Variance and Convergence -- 11.5.3. Fixed-Data Experience Replay on TD versus MC -- 11.5.4. Bootstrapping and Experiencing -- 11.6. TD(λ) PREDICTION -- 11.6.1. n-Step Bootstrapping Prediction Algorithm -- 11.6.2. λ-Return Prediction Algorithm -- 11.6.3. Eligibility Traces | |
505 | 8 | |a 11.6.4. Implementation of the TD(λ) Prediction Algorithm -- 11.7. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 12: Monte-Carlo and Temporal-Difference for Control -- 12.1. REFRESHER ON GENERALIZED POLICY ITERATION (GPI) -- 12.2. GPI WITH EVALUATION AS MONTE-CARLO -- 12.3. GLIE MONTE-CONTROL CONTROL -- 12.4. SARSA -- 12.5. SARSA(λ) -- 12.6. OFF-POLICY CONTROL -- 12.6.1. Q-Learning -- 12.6.2. Windy Grid -- 12.6.3. Importance Sampling -- 12.7. CONCEPTUAL LINKAGE BETWEEN DP AND TD ALGORITHMS -- 12.8. CONVERGENCE OF RL ALGORITHMS -- 12.9. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 13: Batch RL, Experience-Replay, DQN, LSPI, Gradient TD -- 13.1. BATCH RL AND EXPERIENCE-REPLAY -- 13.2. A GENERIC IMPLEMENTATION OF EXPERIENCE-REPLAY -- 13.3. LEAST-SQUARES RL PREDICTION -- 13.3.1. Least-Squares Monte-Carlo (LSMC) -- 13.3.2. Least-Squares Temporal-Difference (LSTD) -- 13.3.3. LSTD(λ) -- 13.3.4. Convergence of Least-Squares Prediction -- 13.4. Q-LEARNING WITH EXPERIENCE-REPLAY -- 13.4.1. Deep Q-Networks (DQN) Algorithm -- 13.5. LEAST-SQUARES POLICY ITERATION (LSPI) -- 13.5.1. Saving Your Village from a Vampire -- 13.5.2. Least-Squares Control Convergence -- 13.6. RL FOR OPTIMAL EXERCISE OF AMERICAN OPTIONS -- 13.6.1. LSPI for American Options Pricing -- 13.6.2. Deep Q-Learning for American Options Pricing -- 13.7. VALUE FUNCTION GEOMETRY -- 13.7.1. Notation and Definitions -- 13.7.2. Bellman Policy Operator and Projection Operator -- 13.7.3. Vectors of Interest in the Φ Subspace -- 13.8. GRADIENT TEMPORAL-DIFFERENCE (GRADIENT TD) -- 13.9. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 14: Policy Gradient Algorithms -- 14.1. ADVANTAGES AND DISADVANTAGES OF POLICY GRADIENT ALGORITHMS -- 14.2. POLICY GRADIENT THEOREM -- 14.2.1. Notation and Definitions -- 14.2.2. Statement of the Policy Gradient Theorem -- 14.2.3. Proof of the Policy Gradient Theorem | |
505 | 8 | |a 14.3. SCORE FUNCTION FOR CANONICAL POLICY FUNCTIONS. | |
653 | 6 | |a Electronic books | |
700 | 1 | |a Jelvis, Tikhon |e Sonstige |4 oth | |
776 | 0 | 8 | |i Erscheint auch als |n Druck-Ausgabe |a Rao, Ashwin |t Foundations of Reinforcement Learning with Applications in Finance |d Milton : CRC Press LLC,c2022 |z 9781032124124 |
912 | |a ZDB-30-PQE | ||
999 | |a oai:aleph.bib-bvb.de:BVB01-034007501 | ||
966 | e | |u https://ebookcentral.proquest.com/lib/hwr/detail.action?docID=7121499 |l HWR01 |p ZDB-30-PQE |q HWR_PDA_PQE |x Aggregator |3 Volltext |
Datensatz im Suchindex
_version_ | 1804184765809557504 |
---|---|
adam_txt | |
any_adam_object | |
any_adam_object_boolean | |
author | Rao, Ashwin |
author_facet | Rao, Ashwin |
author_role | aut |
author_sort | Rao, Ashwin |
author_variant | a r ar |
building | Verbundindex |
bvnumber | BV048632482 |
collection | ZDB-30-PQE |
contents | Cover -- Half Title -- Title Page -- Copyright Page -- Contents -- Preface -- Author Biographies -- Summary of Notation -- CHAPTER 1: Overview -- 1.1. LEARNING REINFORCEMENT LEARNING -- 1.2. WHAT YOU WILL LEARN FROM THIS BOOK -- 1.3. EXPECTED BACKGROUND TO READ THIS BOOK -- 1.4. DECLUTTERING THE JARGON LINKED TO REINFORCEMENT LEARNING -- 1.5. INTRODUCTION TO THE MARKOV DECISION PROCESS (MDP) FRAMEWORK -- 1.6. REAL-WORLD PROBLEMS THAT FIT THE MDP FRAMEWORK -- 1.7. THE INHERENT DIFFICULTY IN SOLVING MDPS -- 1.8. VALUE FUNCTION, BELLMAN EQUATIONS, DYNAMIC PROGRAMMING AND RL -- 1.9. OUTLINE OF CHAPTERS -- 1.9.1. Module I: Processes and Planning Algorithms -- 1.9.2. Module II: Modeling Financial Applications -- 1.9.3. Module III: Reinforcement Learning Algorithms -- 1.9.4. Module IV: Finishing Touches -- 1.9.5. Short Appendix Chapters -- CHAPTER 2: Programming and Design -- 2.1. CODE DESIGN -- 2.2. ENVIRONMENT SETUP -- 2.3. CLASSES AND INTERFACES -- 2.3.1. A Distribution Interface -- 2.3.2. A Concrete Distribution -- 2.3.2.1. Dataclasses -- 2.3.2.2. Immutability -- 2.3.3. Checking Types -- 2.3.3.1. Static Typing -- 2.3.4. Type Variables -- 2.3.5. Functionality -- 2.4. ABSTRACTING OVER COMPUTATION -- 2.4.1. First-Class Functions -- 2.4.1.1. Lambdas -- 2.4.2. Iterative Algorithms -- 2.4.2.1. Iterators and Generators -- 2.5. KEY TAKEAWAYS FROM THIS CHAPTER -- MODULE I: Processes and Planning Algorithms -- Chapter 3: Markov Processes -- 3.1. THE CONCEPT OF STATE IN A PROCESS -- 3.2. UNDERSTANDING MARKOV PROPERTY FROM STOCK PRICE EXAMPLES -- 3.3. FORMAL DEFINITIONS FOR MARKOV PROCESSES -- 3.3.1. Starting States -- 3.3.2. Terminal States -- 3.3.3. Markov Process Implementation -- 3.4. STOCK PRICE EXAMPLES MODELED AS MARKOV PROCESSES -- 3.5. FINITE MARKOV PROCESSES -- 3.6. SIMPLE INVENTORY EXAMPLE -- 3.7. STATIONARY DISTRIBUTION OF A MARKOV PROCESS. 3.8. FORMALISM OF MARKOV REWARD PROCESSES -- 3.9. SIMPLE INVENTORY EXAMPLE AS A MARKOV REWARD PROCESS -- 3.10. FINITE MARKOV REWARD PROCESSES -- 3.11. SIMPLE INVENTORY EXAMPLE AS A FINITE MARKOV REWARD PROCESS -- 3.12. VALUE FUNCTION OF A MARKOV REWARD PROCESS -- 3.13. SUMMARY OF KEY LEARNINGS FROM THIS CHAPTER -- Chapter 4: Markov Decision Processes -- 4.1. SIMPLE INVENTORY EXAMPLE: HOW MUCH TO ORDER? -- 4.2. THE DIFFICULTY OF SEQUENTIAL DECISIONING UNDER UNCERTAINTY -- 4.3. FORMAL DEFINITION OF A MARKOV DECISION PROCESS -- 4.4. POLICY -- 4.5. [MARKOV DECISION PROCESS, POLICY] := MARKOV REWARD PROCESS -- 4.6. SIMPLE INVENTORY EXAMPLE WITH UNLIMITED CAPACITY (INFINITE STATE/ACTION SPACE) -- 4.7. FINITE MARKOV DECISION PROCESSES -- 4.8. SIMPLE INVENTORY EXAMPLE AS A FINITE MARKOV DECISION PROCESS -- 4.9. MDP VALUE FUNCTION FOR A FIXED POLICY -- 4.10. OPTIMAL VALUE FUNCTION AND OPTIMAL POLICIES -- 4.11. VARIANTS AND EXTENSIONS OF MDPS -- 4.11.1. Size of Spaces and Discrete versus Continuous -- 4.11.1.1. State Space -- 4.11.1.2. Action Space -- 4.11.1.3. Time Steps -- 4.11.2. Partially-Observable Markov Decision Processes (POMDPs) -- 4.12. SUMMARY OF KEY LEARNINGS FROM THIS CHAPTER -- Chapter 5: Dynamic Programming Algorithms -- 5.1. PLANNING VERSUS LEARNING -- 5.2. USAGE OF THE TERM DYNAMIC PROGRAMMING -- 5.3. FIXED-POINT THEORY -- 5.4. BELLMAN POLICY OPERATOR AND POLICY EVALUATION ALGORITHM -- 5.5. GREEDY POLICY -- 5.6. POLICY IMPROVEMENT -- 5.7. POLICY ITERATION ALGORITHM -- 5.8. BELLMAN OPTIMALITY OPERATOR AND VALUE ITERATION ALGORITHM -- 5.9. OPTIMAL POLICY FROM OPTIMAL VALUE FUNCTION -- 5.10. REVISITING THE SIMPLE INVENTORY EXAMPLE -- 5.11. GENERALIZED POLICY ITERATION -- 5.12. ASYNCHRONOUS DYNAMIC PROGRAMMING -- 5.13. FINITE-HORIZON DYNAMIC PROGRAMMING: BACKWARD INDUCTION -- 5.14. DYNAMIC PRICING FOR END-OF-LIFE/END-OF-SEASON OF A PRODUCT. 5.15. GENERALIZATION TO NON-TABULAR ALGORITHMS -- 5.16. SUMMARY OF KEY LEARNINGS FROM THIS CHAPTER -- Chapter 6: Function Approximation and Approximate Dynamic Programming -- 6.1. FUNCTION APPROXIMATION -- 6.2. LINEAR FUNCTION APPROXIMATION -- 6.3. NEURAL NETWORK FUNCTION APPROXIMATION -- 6.4. TABULAR AS A FORM OF FUNCTIONAPPROX -- 6.5. APPROXIMATE POLICY EVALUATION -- 6.6. APPROXIMATE VALUE ITERATION -- 6.7. FINITE-HORIZON APPROXIMATE POLICY EVALUATION -- 6.8. FINITE-HORIZON APPROXIMATE VALUE ITERATION -- 6.9. FINITE-HORIZON APPROXIMATE Q-VALUE ITERATION -- 6.10. HOW TO CONSTRUCT THE NON-TERMINAL STATES DISTRIBUTION -- 6.11. KEY TAKEAWAYS FROM THIS CHAPTER -- MODULE II: Modeling Financial Applications -- Chapter 7: Utility Theory -- 7.1. INTRODUCTION TO THE CONCEPT OF UTILITY -- 7.2. A SIMPLE FINANCIAL EXAMPLE -- 7.3. THE SHAPE OF THE UTILITY FUNCTION -- 7.4. CALCULATING THE RISK-PREMIUM -- 7.5. CONSTANT ABSOLUTE RISK-AVERSION (CARA) -- 7.6. A PORTFOLIO APPLICATION OF CARA -- 7.7. CONSTANT RELATIVE RISK-AVERSION (CRRA) -- 7.8. A PORTFOLIO APPLICATION OF CRRA -- 7.9. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 8: Dynamic Asset-Allocation and Consumption -- 8.1. OPTIMIZATION OF PERSONAL FINANCE -- 8.2. MERTON'S PORTFOLIO PROBLEM AND SOLUTION -- 8.3. DEVELOPING INTUITION FOR THE SOLUTION TO MERTON'S PORTFOLIO PROBLEM -- 8.4. A DISCRETE-TIME ASSET-ALLOCATION EXAMPLE -- 8.5. PORTING TO REAL-WORLD -- 8.6. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 9: Derivatives Pricing and Hedging -- 9.1. A BRIEF INTRODUCTION TO DERIVATIVES -- 9.1.1. Forwards -- 9.1.2. European Options -- 9.1.3. American Options -- 9.2. NOTATION FOR THE SINGLE-PERIOD SIMPLE SETTING -- 9.3. PORTFOLIOS, ARBITRAGE AND RISK-NEUTRAL PROBABILITY MEASURE -- 9.4. FIRST FUNDAMENTAL THEOREM OF ASSET PRICING (1ST FTAP) -- 9.5. SECOND FUNDAMENTAL THEOREM OF ASSET PRICING (2ND FTAP) 9.6. DERIVATIVES PRICING IN SINGLE-PERIOD SETTING -- 9.6.1. Derivatives Pricing When Market Is Complete -- 9.6.2. Derivatives Pricing When Market Is Incomplete -- 9.6.3. Derivatives Pricing When Market Has Arbitrage -- 9.7. DERIVATIVES PRICING IN MULTI-PERIOD/CONTINUOUS-TIME -- 9.7.1. Multi-Period Complete-Market Setting -- 9.7.2. Continuous-Time Complete-Market Setting -- 9.8. OPTIMAL EXERCISE OF AMERICAN OPTIONS CAST AS A FINITE MDP -- 9.9. GENERALIZING TO OPTIMAL-STOPPING PROBLEMS -- 9.10. PRICING/HEDGING IN AN INCOMPLETE MARKET CAST AS AN MDP -- 9.11. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 10: Order-Book Trading Algorithms -- 10.1. BASICS OF ORDER BOOK AND PRICE IMPACT -- 10.2. OPTIMAL EXECUTION OF A MARKET ORDER -- 10.2.1. Simple Linear Price Impact Model with No Risk-Aversion -- 10.2.2. Paper by Bertsimas and Lo on Optimal Order Execution -- 10.2.3. Incorporating Risk-Aversion and Real-World Considerations -- 10.3. OPTIMAL MARKET-MAKING -- 10.3.1. Avellaneda-Stoikov Continuous-Time Formulation -- 10.3.2. Solving the Avellaneda-Stoikov Formulation -- 10.3.3. Analytical Approximation to the Solution to Avellaneda-Stoikov Formulation -- 10.3.4. Real-World Market-Making -- 10.4. KEY TAKEAWAYS FROM THIS CHAPTER -- MODULE III: Reinforcement Learning Algorithms -- Chapter 11: Monte-Carlo and Temporal-Difference for Prediction -- 11.1. OVERVIEW OF THE REINFORCEMENT LEARNING APPROACH -- 11.2. RL FOR PREDICTION -- 11.3. MONTE-CARLO (MC) PREDICTION -- 11.4. TEMPORAL-DIFFERENCE (TD) PREDICTION -- 11.5. TD VERSUS MC -- 11.5.1. TD Learning Akin to Human Learning -- 11.5.2. Bias, Variance and Convergence -- 11.5.3. Fixed-Data Experience Replay on TD versus MC -- 11.5.4. Bootstrapping and Experiencing -- 11.6. TD(λ) PREDICTION -- 11.6.1. n-Step Bootstrapping Prediction Algorithm -- 11.6.2. λ-Return Prediction Algorithm -- 11.6.3. Eligibility Traces 11.6.4. Implementation of the TD(λ) Prediction Algorithm -- 11.7. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 12: Monte-Carlo and Temporal-Difference for Control -- 12.1. REFRESHER ON GENERALIZED POLICY ITERATION (GPI) -- 12.2. GPI WITH EVALUATION AS MONTE-CARLO -- 12.3. GLIE MONTE-CONTROL CONTROL -- 12.4. SARSA -- 12.5. SARSA(λ) -- 12.6. OFF-POLICY CONTROL -- 12.6.1. Q-Learning -- 12.6.2. Windy Grid -- 12.6.3. Importance Sampling -- 12.7. CONCEPTUAL LINKAGE BETWEEN DP AND TD ALGORITHMS -- 12.8. CONVERGENCE OF RL ALGORITHMS -- 12.9. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 13: Batch RL, Experience-Replay, DQN, LSPI, Gradient TD -- 13.1. BATCH RL AND EXPERIENCE-REPLAY -- 13.2. A GENERIC IMPLEMENTATION OF EXPERIENCE-REPLAY -- 13.3. LEAST-SQUARES RL PREDICTION -- 13.3.1. Least-Squares Monte-Carlo (LSMC) -- 13.3.2. Least-Squares Temporal-Difference (LSTD) -- 13.3.3. LSTD(λ) -- 13.3.4. Convergence of Least-Squares Prediction -- 13.4. Q-LEARNING WITH EXPERIENCE-REPLAY -- 13.4.1. Deep Q-Networks (DQN) Algorithm -- 13.5. LEAST-SQUARES POLICY ITERATION (LSPI) -- 13.5.1. Saving Your Village from a Vampire -- 13.5.2. Least-Squares Control Convergence -- 13.6. RL FOR OPTIMAL EXERCISE OF AMERICAN OPTIONS -- 13.6.1. LSPI for American Options Pricing -- 13.6.2. Deep Q-Learning for American Options Pricing -- 13.7. VALUE FUNCTION GEOMETRY -- 13.7.1. Notation and Definitions -- 13.7.2. Bellman Policy Operator and Projection Operator -- 13.7.3. Vectors of Interest in the Φ Subspace -- 13.8. GRADIENT TEMPORAL-DIFFERENCE (GRADIENT TD) -- 13.9. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 14: Policy Gradient Algorithms -- 14.1. ADVANTAGES AND DISADVANTAGES OF POLICY GRADIENT ALGORITHMS -- 14.2. POLICY GRADIENT THEOREM -- 14.2.1. Notation and Definitions -- 14.2.2. Statement of the Policy Gradient Theorem -- 14.2.3. Proof of the Policy Gradient Theorem 14.3. SCORE FUNCTION FOR CANONICAL POLICY FUNCTIONS. |
ctrlnum | (ZDB-30-PQE)EBC7121499 (ZDB-30-PAD)EBC7121499 (ZDB-89-EBL)EBL7121499 (OCoLC)1349282032 (DE-599)BVBBV048632482 |
format | Electronic eBook |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>10870nmm a2200445 c 4500</leader><controlfield tag="001">BV048632482</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">00000000000000.0</controlfield><controlfield tag="007">cr|uuu---uuuuu</controlfield><controlfield tag="008">230105s2022 |||| o||u| ||||||eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781000801101</subfield><subfield code="q">electronic bk.</subfield><subfield code="9">978-1-00-080110-1</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781032124124</subfield><subfield code="9">978-1-03-212412-4</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(ZDB-30-PQE)EBC7121499</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(ZDB-30-PAD)EBC7121499</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(ZDB-89-EBL)EBL7121499</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1349282032</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV048632482</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-2070s</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Rao, Ashwin</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Foundations of Reinforcement Learning with Applications in Finance</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Milton</subfield><subfield code="b">CRC Press LLC</subfield><subfield code="c">2022</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">©2023</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">1 Online-Ressource (522 Seiten)</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">Cover -- Half Title -- Title Page -- Copyright Page -- Contents -- Preface -- Author Biographies -- Summary of Notation -- CHAPTER 1: Overview -- 1.1. LEARNING REINFORCEMENT LEARNING -- 1.2. WHAT YOU WILL LEARN FROM THIS BOOK -- 1.3. EXPECTED BACKGROUND TO READ THIS BOOK -- 1.4. DECLUTTERING THE JARGON LINKED TO REINFORCEMENT LEARNING -- 1.5. INTRODUCTION TO THE MARKOV DECISION PROCESS (MDP) FRAMEWORK -- 1.6. REAL-WORLD PROBLEMS THAT FIT THE MDP FRAMEWORK -- 1.7. THE INHERENT DIFFICULTY IN SOLVING MDPS -- 1.8. VALUE FUNCTION, BELLMAN EQUATIONS, DYNAMIC PROGRAMMING AND RL -- 1.9. OUTLINE OF CHAPTERS -- 1.9.1. Module I: Processes and Planning Algorithms -- 1.9.2. Module II: Modeling Financial Applications -- 1.9.3. Module III: Reinforcement Learning Algorithms -- 1.9.4. Module IV: Finishing Touches -- 1.9.5. Short Appendix Chapters -- CHAPTER 2: Programming and Design -- 2.1. CODE DESIGN -- 2.2. ENVIRONMENT SETUP -- 2.3. CLASSES AND INTERFACES -- 2.3.1. A Distribution Interface -- 2.3.2. A Concrete Distribution -- 2.3.2.1. Dataclasses -- 2.3.2.2. Immutability -- 2.3.3. Checking Types -- 2.3.3.1. Static Typing -- 2.3.4. Type Variables -- 2.3.5. Functionality -- 2.4. ABSTRACTING OVER COMPUTATION -- 2.4.1. First-Class Functions -- 2.4.1.1. Lambdas -- 2.4.2. Iterative Algorithms -- 2.4.2.1. Iterators and Generators -- 2.5. KEY TAKEAWAYS FROM THIS CHAPTER -- MODULE I: Processes and Planning Algorithms -- Chapter 3: Markov Processes -- 3.1. THE CONCEPT OF STATE IN A PROCESS -- 3.2. UNDERSTANDING MARKOV PROPERTY FROM STOCK PRICE EXAMPLES -- 3.3. FORMAL DEFINITIONS FOR MARKOV PROCESSES -- 3.3.1. Starting States -- 3.3.2. Terminal States -- 3.3.3. Markov Process Implementation -- 3.4. STOCK PRICE EXAMPLES MODELED AS MARKOV PROCESSES -- 3.5. FINITE MARKOV PROCESSES -- 3.6. SIMPLE INVENTORY EXAMPLE -- 3.7. STATIONARY DISTRIBUTION OF A MARKOV PROCESS.</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">3.8. FORMALISM OF MARKOV REWARD PROCESSES -- 3.9. SIMPLE INVENTORY EXAMPLE AS A MARKOV REWARD PROCESS -- 3.10. FINITE MARKOV REWARD PROCESSES -- 3.11. SIMPLE INVENTORY EXAMPLE AS A FINITE MARKOV REWARD PROCESS -- 3.12. VALUE FUNCTION OF A MARKOV REWARD PROCESS -- 3.13. SUMMARY OF KEY LEARNINGS FROM THIS CHAPTER -- Chapter 4: Markov Decision Processes -- 4.1. SIMPLE INVENTORY EXAMPLE: HOW MUCH TO ORDER? -- 4.2. THE DIFFICULTY OF SEQUENTIAL DECISIONING UNDER UNCERTAINTY -- 4.3. FORMAL DEFINITION OF A MARKOV DECISION PROCESS -- 4.4. POLICY -- 4.5. [MARKOV DECISION PROCESS, POLICY] := MARKOV REWARD PROCESS -- 4.6. SIMPLE INVENTORY EXAMPLE WITH UNLIMITED CAPACITY (INFINITE STATE/ACTION SPACE) -- 4.7. FINITE MARKOV DECISION PROCESSES -- 4.8. SIMPLE INVENTORY EXAMPLE AS A FINITE MARKOV DECISION PROCESS -- 4.9. MDP VALUE FUNCTION FOR A FIXED POLICY -- 4.10. OPTIMAL VALUE FUNCTION AND OPTIMAL POLICIES -- 4.11. VARIANTS AND EXTENSIONS OF MDPS -- 4.11.1. Size of Spaces and Discrete versus Continuous -- 4.11.1.1. State Space -- 4.11.1.2. Action Space -- 4.11.1.3. Time Steps -- 4.11.2. Partially-Observable Markov Decision Processes (POMDPs) -- 4.12. SUMMARY OF KEY LEARNINGS FROM THIS CHAPTER -- Chapter 5: Dynamic Programming Algorithms -- 5.1. PLANNING VERSUS LEARNING -- 5.2. USAGE OF THE TERM DYNAMIC PROGRAMMING -- 5.3. FIXED-POINT THEORY -- 5.4. BELLMAN POLICY OPERATOR AND POLICY EVALUATION ALGORITHM -- 5.5. GREEDY POLICY -- 5.6. POLICY IMPROVEMENT -- 5.7. POLICY ITERATION ALGORITHM -- 5.8. BELLMAN OPTIMALITY OPERATOR AND VALUE ITERATION ALGORITHM -- 5.9. OPTIMAL POLICY FROM OPTIMAL VALUE FUNCTION -- 5.10. REVISITING THE SIMPLE INVENTORY EXAMPLE -- 5.11. GENERALIZED POLICY ITERATION -- 5.12. ASYNCHRONOUS DYNAMIC PROGRAMMING -- 5.13. FINITE-HORIZON DYNAMIC PROGRAMMING: BACKWARD INDUCTION -- 5.14. DYNAMIC PRICING FOR END-OF-LIFE/END-OF-SEASON OF A PRODUCT.</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">5.15. GENERALIZATION TO NON-TABULAR ALGORITHMS -- 5.16. SUMMARY OF KEY LEARNINGS FROM THIS CHAPTER -- Chapter 6: Function Approximation and Approximate Dynamic Programming -- 6.1. FUNCTION APPROXIMATION -- 6.2. LINEAR FUNCTION APPROXIMATION -- 6.3. NEURAL NETWORK FUNCTION APPROXIMATION -- 6.4. TABULAR AS A FORM OF FUNCTIONAPPROX -- 6.5. APPROXIMATE POLICY EVALUATION -- 6.6. APPROXIMATE VALUE ITERATION -- 6.7. FINITE-HORIZON APPROXIMATE POLICY EVALUATION -- 6.8. FINITE-HORIZON APPROXIMATE VALUE ITERATION -- 6.9. FINITE-HORIZON APPROXIMATE Q-VALUE ITERATION -- 6.10. HOW TO CONSTRUCT THE NON-TERMINAL STATES DISTRIBUTION -- 6.11. KEY TAKEAWAYS FROM THIS CHAPTER -- MODULE II: Modeling Financial Applications -- Chapter 7: Utility Theory -- 7.1. INTRODUCTION TO THE CONCEPT OF UTILITY -- 7.2. A SIMPLE FINANCIAL EXAMPLE -- 7.3. THE SHAPE OF THE UTILITY FUNCTION -- 7.4. CALCULATING THE RISK-PREMIUM -- 7.5. CONSTANT ABSOLUTE RISK-AVERSION (CARA) -- 7.6. A PORTFOLIO APPLICATION OF CARA -- 7.7. CONSTANT RELATIVE RISK-AVERSION (CRRA) -- 7.8. A PORTFOLIO APPLICATION OF CRRA -- 7.9. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 8: Dynamic Asset-Allocation and Consumption -- 8.1. OPTIMIZATION OF PERSONAL FINANCE -- 8.2. MERTON'S PORTFOLIO PROBLEM AND SOLUTION -- 8.3. DEVELOPING INTUITION FOR THE SOLUTION TO MERTON'S PORTFOLIO PROBLEM -- 8.4. A DISCRETE-TIME ASSET-ALLOCATION EXAMPLE -- 8.5. PORTING TO REAL-WORLD -- 8.6. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 9: Derivatives Pricing and Hedging -- 9.1. A BRIEF INTRODUCTION TO DERIVATIVES -- 9.1.1. Forwards -- 9.1.2. European Options -- 9.1.3. American Options -- 9.2. NOTATION FOR THE SINGLE-PERIOD SIMPLE SETTING -- 9.3. PORTFOLIOS, ARBITRAGE AND RISK-NEUTRAL PROBABILITY MEASURE -- 9.4. FIRST FUNDAMENTAL THEOREM OF ASSET PRICING (1ST FTAP) -- 9.5. SECOND FUNDAMENTAL THEOREM OF ASSET PRICING (2ND FTAP)</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">9.6. DERIVATIVES PRICING IN SINGLE-PERIOD SETTING -- 9.6.1. Derivatives Pricing When Market Is Complete -- 9.6.2. Derivatives Pricing When Market Is Incomplete -- 9.6.3. Derivatives Pricing When Market Has Arbitrage -- 9.7. DERIVATIVES PRICING IN MULTI-PERIOD/CONTINUOUS-TIME -- 9.7.1. Multi-Period Complete-Market Setting -- 9.7.2. Continuous-Time Complete-Market Setting -- 9.8. OPTIMAL EXERCISE OF AMERICAN OPTIONS CAST AS A FINITE MDP -- 9.9. GENERALIZING TO OPTIMAL-STOPPING PROBLEMS -- 9.10. PRICING/HEDGING IN AN INCOMPLETE MARKET CAST AS AN MDP -- 9.11. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 10: Order-Book Trading Algorithms -- 10.1. BASICS OF ORDER BOOK AND PRICE IMPACT -- 10.2. OPTIMAL EXECUTION OF A MARKET ORDER -- 10.2.1. Simple Linear Price Impact Model with No Risk-Aversion -- 10.2.2. Paper by Bertsimas and Lo on Optimal Order Execution -- 10.2.3. Incorporating Risk-Aversion and Real-World Considerations -- 10.3. OPTIMAL MARKET-MAKING -- 10.3.1. Avellaneda-Stoikov Continuous-Time Formulation -- 10.3.2. Solving the Avellaneda-Stoikov Formulation -- 10.3.3. Analytical Approximation to the Solution to Avellaneda-Stoikov Formulation -- 10.3.4. Real-World Market-Making -- 10.4. KEY TAKEAWAYS FROM THIS CHAPTER -- MODULE III: Reinforcement Learning Algorithms -- Chapter 11: Monte-Carlo and Temporal-Difference for Prediction -- 11.1. OVERVIEW OF THE REINFORCEMENT LEARNING APPROACH -- 11.2. RL FOR PREDICTION -- 11.3. MONTE-CARLO (MC) PREDICTION -- 11.4. TEMPORAL-DIFFERENCE (TD) PREDICTION -- 11.5. TD VERSUS MC -- 11.5.1. TD Learning Akin to Human Learning -- 11.5.2. Bias, Variance and Convergence -- 11.5.3. Fixed-Data Experience Replay on TD versus MC -- 11.5.4. Bootstrapping and Experiencing -- 11.6. TD(λ) PREDICTION -- 11.6.1. n-Step Bootstrapping Prediction Algorithm -- 11.6.2. λ-Return Prediction Algorithm -- 11.6.3. Eligibility Traces</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">11.6.4. Implementation of the TD(λ) Prediction Algorithm -- 11.7. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 12: Monte-Carlo and Temporal-Difference for Control -- 12.1. REFRESHER ON GENERALIZED POLICY ITERATION (GPI) -- 12.2. GPI WITH EVALUATION AS MONTE-CARLO -- 12.3. GLIE MONTE-CONTROL CONTROL -- 12.4. SARSA -- 12.5. SARSA(λ) -- 12.6. OFF-POLICY CONTROL -- 12.6.1. Q-Learning -- 12.6.2. Windy Grid -- 12.6.3. Importance Sampling -- 12.7. CONCEPTUAL LINKAGE BETWEEN DP AND TD ALGORITHMS -- 12.8. CONVERGENCE OF RL ALGORITHMS -- 12.9. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 13: Batch RL, Experience-Replay, DQN, LSPI, Gradient TD -- 13.1. BATCH RL AND EXPERIENCE-REPLAY -- 13.2. A GENERIC IMPLEMENTATION OF EXPERIENCE-REPLAY -- 13.3. LEAST-SQUARES RL PREDICTION -- 13.3.1. Least-Squares Monte-Carlo (LSMC) -- 13.3.2. Least-Squares Temporal-Difference (LSTD) -- 13.3.3. LSTD(λ) -- 13.3.4. Convergence of Least-Squares Prediction -- 13.4. Q-LEARNING WITH EXPERIENCE-REPLAY -- 13.4.1. Deep Q-Networks (DQN) Algorithm -- 13.5. LEAST-SQUARES POLICY ITERATION (LSPI) -- 13.5.1. Saving Your Village from a Vampire -- 13.5.2. Least-Squares Control Convergence -- 13.6. RL FOR OPTIMAL EXERCISE OF AMERICAN OPTIONS -- 13.6.1. LSPI for American Options Pricing -- 13.6.2. Deep Q-Learning for American Options Pricing -- 13.7. VALUE FUNCTION GEOMETRY -- 13.7.1. Notation and Definitions -- 13.7.2. Bellman Policy Operator and Projection Operator -- 13.7.3. Vectors of Interest in the Φ Subspace -- 13.8. GRADIENT TEMPORAL-DIFFERENCE (GRADIENT TD) -- 13.9. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 14: Policy Gradient Algorithms -- 14.1. ADVANTAGES AND DISADVANTAGES OF POLICY GRADIENT ALGORITHMS -- 14.2. POLICY GRADIENT THEOREM -- 14.2.1. Notation and Definitions -- 14.2.2. Statement of the Policy Gradient Theorem -- 14.2.3. Proof of the Policy Gradient Theorem</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">14.3. SCORE FUNCTION FOR CANONICAL POLICY FUNCTIONS.</subfield></datafield><datafield tag="653" ind1=" " ind2="6"><subfield code="a">Electronic books</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Jelvis, Tikhon</subfield><subfield code="e">Sonstige</subfield><subfield code="4">oth</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Druck-Ausgabe</subfield><subfield code="a">Rao, Ashwin</subfield><subfield code="t">Foundations of Reinforcement Learning with Applications in Finance</subfield><subfield code="d">Milton : CRC Press LLC,c2022</subfield><subfield code="z">9781032124124</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">ZDB-30-PQE</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-034007501</subfield></datafield><datafield tag="966" ind1="e" ind2=" "><subfield code="u">https://ebookcentral.proquest.com/lib/hwr/detail.action?docID=7121499</subfield><subfield code="l">HWR01</subfield><subfield code="p">ZDB-30-PQE</subfield><subfield code="q">HWR_PDA_PQE</subfield><subfield code="x">Aggregator</subfield><subfield code="3">Volltext</subfield></datafield></record></collection> |
id | DE-604.BV048632482 |
illustrated | Not Illustrated |
index_date | 2024-07-03T21:16:06Z |
indexdate | 2024-07-10T09:44:33Z |
institution | BVB |
isbn | 9781000801101 9781032124124 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-034007501 |
oclc_num | 1349282032 |
open_access_boolean | |
owner | DE-2070s |
owner_facet | DE-2070s |
physical | 1 Online-Ressource (522 Seiten) |
psigel | ZDB-30-PQE ZDB-30-PQE HWR_PDA_PQE |
publishDate | 2022 |
publishDateSearch | 2022 |
publishDateSort | 2022 |
publisher | CRC Press LLC |
record_format | marc |
spelling | Rao, Ashwin Verfasser aut Foundations of Reinforcement Learning with Applications in Finance Milton CRC Press LLC 2022 ©2023 1 Online-Ressource (522 Seiten) txt rdacontent c rdamedia cr rdacarrier Cover -- Half Title -- Title Page -- Copyright Page -- Contents -- Preface -- Author Biographies -- Summary of Notation -- CHAPTER 1: Overview -- 1.1. LEARNING REINFORCEMENT LEARNING -- 1.2. WHAT YOU WILL LEARN FROM THIS BOOK -- 1.3. EXPECTED BACKGROUND TO READ THIS BOOK -- 1.4. DECLUTTERING THE JARGON LINKED TO REINFORCEMENT LEARNING -- 1.5. INTRODUCTION TO THE MARKOV DECISION PROCESS (MDP) FRAMEWORK -- 1.6. REAL-WORLD PROBLEMS THAT FIT THE MDP FRAMEWORK -- 1.7. THE INHERENT DIFFICULTY IN SOLVING MDPS -- 1.8. VALUE FUNCTION, BELLMAN EQUATIONS, DYNAMIC PROGRAMMING AND RL -- 1.9. OUTLINE OF CHAPTERS -- 1.9.1. Module I: Processes and Planning Algorithms -- 1.9.2. Module II: Modeling Financial Applications -- 1.9.3. Module III: Reinforcement Learning Algorithms -- 1.9.4. Module IV: Finishing Touches -- 1.9.5. Short Appendix Chapters -- CHAPTER 2: Programming and Design -- 2.1. CODE DESIGN -- 2.2. ENVIRONMENT SETUP -- 2.3. CLASSES AND INTERFACES -- 2.3.1. A Distribution Interface -- 2.3.2. A Concrete Distribution -- 2.3.2.1. Dataclasses -- 2.3.2.2. Immutability -- 2.3.3. Checking Types -- 2.3.3.1. Static Typing -- 2.3.4. Type Variables -- 2.3.5. Functionality -- 2.4. ABSTRACTING OVER COMPUTATION -- 2.4.1. First-Class Functions -- 2.4.1.1. Lambdas -- 2.4.2. Iterative Algorithms -- 2.4.2.1. Iterators and Generators -- 2.5. KEY TAKEAWAYS FROM THIS CHAPTER -- MODULE I: Processes and Planning Algorithms -- Chapter 3: Markov Processes -- 3.1. THE CONCEPT OF STATE IN A PROCESS -- 3.2. UNDERSTANDING MARKOV PROPERTY FROM STOCK PRICE EXAMPLES -- 3.3. FORMAL DEFINITIONS FOR MARKOV PROCESSES -- 3.3.1. Starting States -- 3.3.2. Terminal States -- 3.3.3. Markov Process Implementation -- 3.4. STOCK PRICE EXAMPLES MODELED AS MARKOV PROCESSES -- 3.5. FINITE MARKOV PROCESSES -- 3.6. SIMPLE INVENTORY EXAMPLE -- 3.7. STATIONARY DISTRIBUTION OF A MARKOV PROCESS. 3.8. FORMALISM OF MARKOV REWARD PROCESSES -- 3.9. SIMPLE INVENTORY EXAMPLE AS A MARKOV REWARD PROCESS -- 3.10. FINITE MARKOV REWARD PROCESSES -- 3.11. SIMPLE INVENTORY EXAMPLE AS A FINITE MARKOV REWARD PROCESS -- 3.12. VALUE FUNCTION OF A MARKOV REWARD PROCESS -- 3.13. SUMMARY OF KEY LEARNINGS FROM THIS CHAPTER -- Chapter 4: Markov Decision Processes -- 4.1. SIMPLE INVENTORY EXAMPLE: HOW MUCH TO ORDER? -- 4.2. THE DIFFICULTY OF SEQUENTIAL DECISIONING UNDER UNCERTAINTY -- 4.3. FORMAL DEFINITION OF A MARKOV DECISION PROCESS -- 4.4. POLICY -- 4.5. [MARKOV DECISION PROCESS, POLICY] := MARKOV REWARD PROCESS -- 4.6. SIMPLE INVENTORY EXAMPLE WITH UNLIMITED CAPACITY (INFINITE STATE/ACTION SPACE) -- 4.7. FINITE MARKOV DECISION PROCESSES -- 4.8. SIMPLE INVENTORY EXAMPLE AS A FINITE MARKOV DECISION PROCESS -- 4.9. MDP VALUE FUNCTION FOR A FIXED POLICY -- 4.10. OPTIMAL VALUE FUNCTION AND OPTIMAL POLICIES -- 4.11. VARIANTS AND EXTENSIONS OF MDPS -- 4.11.1. Size of Spaces and Discrete versus Continuous -- 4.11.1.1. State Space -- 4.11.1.2. Action Space -- 4.11.1.3. Time Steps -- 4.11.2. Partially-Observable Markov Decision Processes (POMDPs) -- 4.12. SUMMARY OF KEY LEARNINGS FROM THIS CHAPTER -- Chapter 5: Dynamic Programming Algorithms -- 5.1. PLANNING VERSUS LEARNING -- 5.2. USAGE OF THE TERM DYNAMIC PROGRAMMING -- 5.3. FIXED-POINT THEORY -- 5.4. BELLMAN POLICY OPERATOR AND POLICY EVALUATION ALGORITHM -- 5.5. GREEDY POLICY -- 5.6. POLICY IMPROVEMENT -- 5.7. POLICY ITERATION ALGORITHM -- 5.8. BELLMAN OPTIMALITY OPERATOR AND VALUE ITERATION ALGORITHM -- 5.9. OPTIMAL POLICY FROM OPTIMAL VALUE FUNCTION -- 5.10. REVISITING THE SIMPLE INVENTORY EXAMPLE -- 5.11. GENERALIZED POLICY ITERATION -- 5.12. ASYNCHRONOUS DYNAMIC PROGRAMMING -- 5.13. FINITE-HORIZON DYNAMIC PROGRAMMING: BACKWARD INDUCTION -- 5.14. DYNAMIC PRICING FOR END-OF-LIFE/END-OF-SEASON OF A PRODUCT. 5.15. GENERALIZATION TO NON-TABULAR ALGORITHMS -- 5.16. SUMMARY OF KEY LEARNINGS FROM THIS CHAPTER -- Chapter 6: Function Approximation and Approximate Dynamic Programming -- 6.1. FUNCTION APPROXIMATION -- 6.2. LINEAR FUNCTION APPROXIMATION -- 6.3. NEURAL NETWORK FUNCTION APPROXIMATION -- 6.4. TABULAR AS A FORM OF FUNCTIONAPPROX -- 6.5. APPROXIMATE POLICY EVALUATION -- 6.6. APPROXIMATE VALUE ITERATION -- 6.7. FINITE-HORIZON APPROXIMATE POLICY EVALUATION -- 6.8. FINITE-HORIZON APPROXIMATE VALUE ITERATION -- 6.9. FINITE-HORIZON APPROXIMATE Q-VALUE ITERATION -- 6.10. HOW TO CONSTRUCT THE NON-TERMINAL STATES DISTRIBUTION -- 6.11. KEY TAKEAWAYS FROM THIS CHAPTER -- MODULE II: Modeling Financial Applications -- Chapter 7: Utility Theory -- 7.1. INTRODUCTION TO THE CONCEPT OF UTILITY -- 7.2. A SIMPLE FINANCIAL EXAMPLE -- 7.3. THE SHAPE OF THE UTILITY FUNCTION -- 7.4. CALCULATING THE RISK-PREMIUM -- 7.5. CONSTANT ABSOLUTE RISK-AVERSION (CARA) -- 7.6. A PORTFOLIO APPLICATION OF CARA -- 7.7. CONSTANT RELATIVE RISK-AVERSION (CRRA) -- 7.8. A PORTFOLIO APPLICATION OF CRRA -- 7.9. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 8: Dynamic Asset-Allocation and Consumption -- 8.1. OPTIMIZATION OF PERSONAL FINANCE -- 8.2. MERTON'S PORTFOLIO PROBLEM AND SOLUTION -- 8.3. DEVELOPING INTUITION FOR THE SOLUTION TO MERTON'S PORTFOLIO PROBLEM -- 8.4. A DISCRETE-TIME ASSET-ALLOCATION EXAMPLE -- 8.5. PORTING TO REAL-WORLD -- 8.6. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 9: Derivatives Pricing and Hedging -- 9.1. A BRIEF INTRODUCTION TO DERIVATIVES -- 9.1.1. Forwards -- 9.1.2. European Options -- 9.1.3. American Options -- 9.2. NOTATION FOR THE SINGLE-PERIOD SIMPLE SETTING -- 9.3. PORTFOLIOS, ARBITRAGE AND RISK-NEUTRAL PROBABILITY MEASURE -- 9.4. FIRST FUNDAMENTAL THEOREM OF ASSET PRICING (1ST FTAP) -- 9.5. SECOND FUNDAMENTAL THEOREM OF ASSET PRICING (2ND FTAP) 9.6. DERIVATIVES PRICING IN SINGLE-PERIOD SETTING -- 9.6.1. Derivatives Pricing When Market Is Complete -- 9.6.2. Derivatives Pricing When Market Is Incomplete -- 9.6.3. Derivatives Pricing When Market Has Arbitrage -- 9.7. DERIVATIVES PRICING IN MULTI-PERIOD/CONTINUOUS-TIME -- 9.7.1. Multi-Period Complete-Market Setting -- 9.7.2. Continuous-Time Complete-Market Setting -- 9.8. OPTIMAL EXERCISE OF AMERICAN OPTIONS CAST AS A FINITE MDP -- 9.9. GENERALIZING TO OPTIMAL-STOPPING PROBLEMS -- 9.10. PRICING/HEDGING IN AN INCOMPLETE MARKET CAST AS AN MDP -- 9.11. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 10: Order-Book Trading Algorithms -- 10.1. BASICS OF ORDER BOOK AND PRICE IMPACT -- 10.2. OPTIMAL EXECUTION OF A MARKET ORDER -- 10.2.1. Simple Linear Price Impact Model with No Risk-Aversion -- 10.2.2. Paper by Bertsimas and Lo on Optimal Order Execution -- 10.2.3. Incorporating Risk-Aversion and Real-World Considerations -- 10.3. OPTIMAL MARKET-MAKING -- 10.3.1. Avellaneda-Stoikov Continuous-Time Formulation -- 10.3.2. Solving the Avellaneda-Stoikov Formulation -- 10.3.3. Analytical Approximation to the Solution to Avellaneda-Stoikov Formulation -- 10.3.4. Real-World Market-Making -- 10.4. KEY TAKEAWAYS FROM THIS CHAPTER -- MODULE III: Reinforcement Learning Algorithms -- Chapter 11: Monte-Carlo and Temporal-Difference for Prediction -- 11.1. OVERVIEW OF THE REINFORCEMENT LEARNING APPROACH -- 11.2. RL FOR PREDICTION -- 11.3. MONTE-CARLO (MC) PREDICTION -- 11.4. TEMPORAL-DIFFERENCE (TD) PREDICTION -- 11.5. TD VERSUS MC -- 11.5.1. TD Learning Akin to Human Learning -- 11.5.2. Bias, Variance and Convergence -- 11.5.3. Fixed-Data Experience Replay on TD versus MC -- 11.5.4. Bootstrapping and Experiencing -- 11.6. TD(λ) PREDICTION -- 11.6.1. n-Step Bootstrapping Prediction Algorithm -- 11.6.2. λ-Return Prediction Algorithm -- 11.6.3. Eligibility Traces 11.6.4. Implementation of the TD(λ) Prediction Algorithm -- 11.7. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 12: Monte-Carlo and Temporal-Difference for Control -- 12.1. REFRESHER ON GENERALIZED POLICY ITERATION (GPI) -- 12.2. GPI WITH EVALUATION AS MONTE-CARLO -- 12.3. GLIE MONTE-CONTROL CONTROL -- 12.4. SARSA -- 12.5. SARSA(λ) -- 12.6. OFF-POLICY CONTROL -- 12.6.1. Q-Learning -- 12.6.2. Windy Grid -- 12.6.3. Importance Sampling -- 12.7. CONCEPTUAL LINKAGE BETWEEN DP AND TD ALGORITHMS -- 12.8. CONVERGENCE OF RL ALGORITHMS -- 12.9. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 13: Batch RL, Experience-Replay, DQN, LSPI, Gradient TD -- 13.1. BATCH RL AND EXPERIENCE-REPLAY -- 13.2. A GENERIC IMPLEMENTATION OF EXPERIENCE-REPLAY -- 13.3. LEAST-SQUARES RL PREDICTION -- 13.3.1. Least-Squares Monte-Carlo (LSMC) -- 13.3.2. Least-Squares Temporal-Difference (LSTD) -- 13.3.3. LSTD(λ) -- 13.3.4. Convergence of Least-Squares Prediction -- 13.4. Q-LEARNING WITH EXPERIENCE-REPLAY -- 13.4.1. Deep Q-Networks (DQN) Algorithm -- 13.5. LEAST-SQUARES POLICY ITERATION (LSPI) -- 13.5.1. Saving Your Village from a Vampire -- 13.5.2. Least-Squares Control Convergence -- 13.6. RL FOR OPTIMAL EXERCISE OF AMERICAN OPTIONS -- 13.6.1. LSPI for American Options Pricing -- 13.6.2. Deep Q-Learning for American Options Pricing -- 13.7. VALUE FUNCTION GEOMETRY -- 13.7.1. Notation and Definitions -- 13.7.2. Bellman Policy Operator and Projection Operator -- 13.7.3. Vectors of Interest in the Φ Subspace -- 13.8. GRADIENT TEMPORAL-DIFFERENCE (GRADIENT TD) -- 13.9. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 14: Policy Gradient Algorithms -- 14.1. ADVANTAGES AND DISADVANTAGES OF POLICY GRADIENT ALGORITHMS -- 14.2. POLICY GRADIENT THEOREM -- 14.2.1. Notation and Definitions -- 14.2.2. Statement of the Policy Gradient Theorem -- 14.2.3. Proof of the Policy Gradient Theorem 14.3. SCORE FUNCTION FOR CANONICAL POLICY FUNCTIONS. Electronic books Jelvis, Tikhon Sonstige oth Erscheint auch als Druck-Ausgabe Rao, Ashwin Foundations of Reinforcement Learning with Applications in Finance Milton : CRC Press LLC,c2022 9781032124124 |
spellingShingle | Rao, Ashwin Foundations of Reinforcement Learning with Applications in Finance Cover -- Half Title -- Title Page -- Copyright Page -- Contents -- Preface -- Author Biographies -- Summary of Notation -- CHAPTER 1: Overview -- 1.1. LEARNING REINFORCEMENT LEARNING -- 1.2. WHAT YOU WILL LEARN FROM THIS BOOK -- 1.3. EXPECTED BACKGROUND TO READ THIS BOOK -- 1.4. DECLUTTERING THE JARGON LINKED TO REINFORCEMENT LEARNING -- 1.5. INTRODUCTION TO THE MARKOV DECISION PROCESS (MDP) FRAMEWORK -- 1.6. REAL-WORLD PROBLEMS THAT FIT THE MDP FRAMEWORK -- 1.7. THE INHERENT DIFFICULTY IN SOLVING MDPS -- 1.8. VALUE FUNCTION, BELLMAN EQUATIONS, DYNAMIC PROGRAMMING AND RL -- 1.9. OUTLINE OF CHAPTERS -- 1.9.1. Module I: Processes and Planning Algorithms -- 1.9.2. Module II: Modeling Financial Applications -- 1.9.3. Module III: Reinforcement Learning Algorithms -- 1.9.4. Module IV: Finishing Touches -- 1.9.5. Short Appendix Chapters -- CHAPTER 2: Programming and Design -- 2.1. CODE DESIGN -- 2.2. ENVIRONMENT SETUP -- 2.3. CLASSES AND INTERFACES -- 2.3.1. A Distribution Interface -- 2.3.2. A Concrete Distribution -- 2.3.2.1. Dataclasses -- 2.3.2.2. Immutability -- 2.3.3. Checking Types -- 2.3.3.1. Static Typing -- 2.3.4. Type Variables -- 2.3.5. Functionality -- 2.4. ABSTRACTING OVER COMPUTATION -- 2.4.1. First-Class Functions -- 2.4.1.1. Lambdas -- 2.4.2. Iterative Algorithms -- 2.4.2.1. Iterators and Generators -- 2.5. KEY TAKEAWAYS FROM THIS CHAPTER -- MODULE I: Processes and Planning Algorithms -- Chapter 3: Markov Processes -- 3.1. THE CONCEPT OF STATE IN A PROCESS -- 3.2. UNDERSTANDING MARKOV PROPERTY FROM STOCK PRICE EXAMPLES -- 3.3. FORMAL DEFINITIONS FOR MARKOV PROCESSES -- 3.3.1. Starting States -- 3.3.2. Terminal States -- 3.3.3. Markov Process Implementation -- 3.4. STOCK PRICE EXAMPLES MODELED AS MARKOV PROCESSES -- 3.5. FINITE MARKOV PROCESSES -- 3.6. SIMPLE INVENTORY EXAMPLE -- 3.7. STATIONARY DISTRIBUTION OF A MARKOV PROCESS. 3.8. FORMALISM OF MARKOV REWARD PROCESSES -- 3.9. SIMPLE INVENTORY EXAMPLE AS A MARKOV REWARD PROCESS -- 3.10. FINITE MARKOV REWARD PROCESSES -- 3.11. SIMPLE INVENTORY EXAMPLE AS A FINITE MARKOV REWARD PROCESS -- 3.12. VALUE FUNCTION OF A MARKOV REWARD PROCESS -- 3.13. SUMMARY OF KEY LEARNINGS FROM THIS CHAPTER -- Chapter 4: Markov Decision Processes -- 4.1. SIMPLE INVENTORY EXAMPLE: HOW MUCH TO ORDER? -- 4.2. THE DIFFICULTY OF SEQUENTIAL DECISIONING UNDER UNCERTAINTY -- 4.3. FORMAL DEFINITION OF A MARKOV DECISION PROCESS -- 4.4. POLICY -- 4.5. [MARKOV DECISION PROCESS, POLICY] := MARKOV REWARD PROCESS -- 4.6. SIMPLE INVENTORY EXAMPLE WITH UNLIMITED CAPACITY (INFINITE STATE/ACTION SPACE) -- 4.7. FINITE MARKOV DECISION PROCESSES -- 4.8. SIMPLE INVENTORY EXAMPLE AS A FINITE MARKOV DECISION PROCESS -- 4.9. MDP VALUE FUNCTION FOR A FIXED POLICY -- 4.10. OPTIMAL VALUE FUNCTION AND OPTIMAL POLICIES -- 4.11. VARIANTS AND EXTENSIONS OF MDPS -- 4.11.1. Size of Spaces and Discrete versus Continuous -- 4.11.1.1. State Space -- 4.11.1.2. Action Space -- 4.11.1.3. Time Steps -- 4.11.2. Partially-Observable Markov Decision Processes (POMDPs) -- 4.12. SUMMARY OF KEY LEARNINGS FROM THIS CHAPTER -- Chapter 5: Dynamic Programming Algorithms -- 5.1. PLANNING VERSUS LEARNING -- 5.2. USAGE OF THE TERM DYNAMIC PROGRAMMING -- 5.3. FIXED-POINT THEORY -- 5.4. BELLMAN POLICY OPERATOR AND POLICY EVALUATION ALGORITHM -- 5.5. GREEDY POLICY -- 5.6. POLICY IMPROVEMENT -- 5.7. POLICY ITERATION ALGORITHM -- 5.8. BELLMAN OPTIMALITY OPERATOR AND VALUE ITERATION ALGORITHM -- 5.9. OPTIMAL POLICY FROM OPTIMAL VALUE FUNCTION -- 5.10. REVISITING THE SIMPLE INVENTORY EXAMPLE -- 5.11. GENERALIZED POLICY ITERATION -- 5.12. ASYNCHRONOUS DYNAMIC PROGRAMMING -- 5.13. FINITE-HORIZON DYNAMIC PROGRAMMING: BACKWARD INDUCTION -- 5.14. DYNAMIC PRICING FOR END-OF-LIFE/END-OF-SEASON OF A PRODUCT. 5.15. GENERALIZATION TO NON-TABULAR ALGORITHMS -- 5.16. SUMMARY OF KEY LEARNINGS FROM THIS CHAPTER -- Chapter 6: Function Approximation and Approximate Dynamic Programming -- 6.1. FUNCTION APPROXIMATION -- 6.2. LINEAR FUNCTION APPROXIMATION -- 6.3. NEURAL NETWORK FUNCTION APPROXIMATION -- 6.4. TABULAR AS A FORM OF FUNCTIONAPPROX -- 6.5. APPROXIMATE POLICY EVALUATION -- 6.6. APPROXIMATE VALUE ITERATION -- 6.7. FINITE-HORIZON APPROXIMATE POLICY EVALUATION -- 6.8. FINITE-HORIZON APPROXIMATE VALUE ITERATION -- 6.9. FINITE-HORIZON APPROXIMATE Q-VALUE ITERATION -- 6.10. HOW TO CONSTRUCT THE NON-TERMINAL STATES DISTRIBUTION -- 6.11. KEY TAKEAWAYS FROM THIS CHAPTER -- MODULE II: Modeling Financial Applications -- Chapter 7: Utility Theory -- 7.1. INTRODUCTION TO THE CONCEPT OF UTILITY -- 7.2. A SIMPLE FINANCIAL EXAMPLE -- 7.3. THE SHAPE OF THE UTILITY FUNCTION -- 7.4. CALCULATING THE RISK-PREMIUM -- 7.5. CONSTANT ABSOLUTE RISK-AVERSION (CARA) -- 7.6. A PORTFOLIO APPLICATION OF CARA -- 7.7. CONSTANT RELATIVE RISK-AVERSION (CRRA) -- 7.8. A PORTFOLIO APPLICATION OF CRRA -- 7.9. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 8: Dynamic Asset-Allocation and Consumption -- 8.1. OPTIMIZATION OF PERSONAL FINANCE -- 8.2. MERTON'S PORTFOLIO PROBLEM AND SOLUTION -- 8.3. DEVELOPING INTUITION FOR THE SOLUTION TO MERTON'S PORTFOLIO PROBLEM -- 8.4. A DISCRETE-TIME ASSET-ALLOCATION EXAMPLE -- 8.5. PORTING TO REAL-WORLD -- 8.6. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 9: Derivatives Pricing and Hedging -- 9.1. A BRIEF INTRODUCTION TO DERIVATIVES -- 9.1.1. Forwards -- 9.1.2. European Options -- 9.1.3. American Options -- 9.2. NOTATION FOR THE SINGLE-PERIOD SIMPLE SETTING -- 9.3. PORTFOLIOS, ARBITRAGE AND RISK-NEUTRAL PROBABILITY MEASURE -- 9.4. FIRST FUNDAMENTAL THEOREM OF ASSET PRICING (1ST FTAP) -- 9.5. SECOND FUNDAMENTAL THEOREM OF ASSET PRICING (2ND FTAP) 9.6. DERIVATIVES PRICING IN SINGLE-PERIOD SETTING -- 9.6.1. Derivatives Pricing When Market Is Complete -- 9.6.2. Derivatives Pricing When Market Is Incomplete -- 9.6.3. Derivatives Pricing When Market Has Arbitrage -- 9.7. DERIVATIVES PRICING IN MULTI-PERIOD/CONTINUOUS-TIME -- 9.7.1. Multi-Period Complete-Market Setting -- 9.7.2. Continuous-Time Complete-Market Setting -- 9.8. OPTIMAL EXERCISE OF AMERICAN OPTIONS CAST AS A FINITE MDP -- 9.9. GENERALIZING TO OPTIMAL-STOPPING PROBLEMS -- 9.10. PRICING/HEDGING IN AN INCOMPLETE MARKET CAST AS AN MDP -- 9.11. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 10: Order-Book Trading Algorithms -- 10.1. BASICS OF ORDER BOOK AND PRICE IMPACT -- 10.2. OPTIMAL EXECUTION OF A MARKET ORDER -- 10.2.1. Simple Linear Price Impact Model with No Risk-Aversion -- 10.2.2. Paper by Bertsimas and Lo on Optimal Order Execution -- 10.2.3. Incorporating Risk-Aversion and Real-World Considerations -- 10.3. OPTIMAL MARKET-MAKING -- 10.3.1. Avellaneda-Stoikov Continuous-Time Formulation -- 10.3.2. Solving the Avellaneda-Stoikov Formulation -- 10.3.3. Analytical Approximation to the Solution to Avellaneda-Stoikov Formulation -- 10.3.4. Real-World Market-Making -- 10.4. KEY TAKEAWAYS FROM THIS CHAPTER -- MODULE III: Reinforcement Learning Algorithms -- Chapter 11: Monte-Carlo and Temporal-Difference for Prediction -- 11.1. OVERVIEW OF THE REINFORCEMENT LEARNING APPROACH -- 11.2. RL FOR PREDICTION -- 11.3. MONTE-CARLO (MC) PREDICTION -- 11.4. TEMPORAL-DIFFERENCE (TD) PREDICTION -- 11.5. TD VERSUS MC -- 11.5.1. TD Learning Akin to Human Learning -- 11.5.2. Bias, Variance and Convergence -- 11.5.3. Fixed-Data Experience Replay on TD versus MC -- 11.5.4. Bootstrapping and Experiencing -- 11.6. TD(λ) PREDICTION -- 11.6.1. n-Step Bootstrapping Prediction Algorithm -- 11.6.2. λ-Return Prediction Algorithm -- 11.6.3. Eligibility Traces 11.6.4. Implementation of the TD(λ) Prediction Algorithm -- 11.7. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 12: Monte-Carlo and Temporal-Difference for Control -- 12.1. REFRESHER ON GENERALIZED POLICY ITERATION (GPI) -- 12.2. GPI WITH EVALUATION AS MONTE-CARLO -- 12.3. GLIE MONTE-CONTROL CONTROL -- 12.4. SARSA -- 12.5. SARSA(λ) -- 12.6. OFF-POLICY CONTROL -- 12.6.1. Q-Learning -- 12.6.2. Windy Grid -- 12.6.3. Importance Sampling -- 12.7. CONCEPTUAL LINKAGE BETWEEN DP AND TD ALGORITHMS -- 12.8. CONVERGENCE OF RL ALGORITHMS -- 12.9. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 13: Batch RL, Experience-Replay, DQN, LSPI, Gradient TD -- 13.1. BATCH RL AND EXPERIENCE-REPLAY -- 13.2. A GENERIC IMPLEMENTATION OF EXPERIENCE-REPLAY -- 13.3. LEAST-SQUARES RL PREDICTION -- 13.3.1. Least-Squares Monte-Carlo (LSMC) -- 13.3.2. Least-Squares Temporal-Difference (LSTD) -- 13.3.3. LSTD(λ) -- 13.3.4. Convergence of Least-Squares Prediction -- 13.4. Q-LEARNING WITH EXPERIENCE-REPLAY -- 13.4.1. Deep Q-Networks (DQN) Algorithm -- 13.5. LEAST-SQUARES POLICY ITERATION (LSPI) -- 13.5.1. Saving Your Village from a Vampire -- 13.5.2. Least-Squares Control Convergence -- 13.6. RL FOR OPTIMAL EXERCISE OF AMERICAN OPTIONS -- 13.6.1. LSPI for American Options Pricing -- 13.6.2. Deep Q-Learning for American Options Pricing -- 13.7. VALUE FUNCTION GEOMETRY -- 13.7.1. Notation and Definitions -- 13.7.2. Bellman Policy Operator and Projection Operator -- 13.7.3. Vectors of Interest in the Φ Subspace -- 13.8. GRADIENT TEMPORAL-DIFFERENCE (GRADIENT TD) -- 13.9. KEY TAKEAWAYS FROM THIS CHAPTER -- Chapter 14: Policy Gradient Algorithms -- 14.1. ADVANTAGES AND DISADVANTAGES OF POLICY GRADIENT ALGORITHMS -- 14.2. POLICY GRADIENT THEOREM -- 14.2.1. Notation and Definitions -- 14.2.2. Statement of the Policy Gradient Theorem -- 14.2.3. Proof of the Policy Gradient Theorem 14.3. SCORE FUNCTION FOR CANONICAL POLICY FUNCTIONS. |
title | Foundations of Reinforcement Learning with Applications in Finance |
title_auth | Foundations of Reinforcement Learning with Applications in Finance |
title_exact_search | Foundations of Reinforcement Learning with Applications in Finance |
title_exact_search_txtP | Foundations of Reinforcement Learning with Applications in Finance |
title_full | Foundations of Reinforcement Learning with Applications in Finance |
title_fullStr | Foundations of Reinforcement Learning with Applications in Finance |
title_full_unstemmed | Foundations of Reinforcement Learning with Applications in Finance |
title_short | Foundations of Reinforcement Learning with Applications in Finance |
title_sort | foundations of reinforcement learning with applications in finance |
work_keys_str_mv | AT raoashwin foundationsofreinforcementlearningwithapplicationsinfinance AT jelvistikhon foundationsofreinforcementlearningwithapplicationsinfinance |