阿爾法零對最優模型預測自適應控制的啟示

Name: 阿爾法零對最優模型預測自適應控制的啟示
Price: 403 TWD
Availability: OnlineOnly
Author: [美]德梅萃·P. 博塞克斯（Dimitri P. Bertsekas）
ISBN: 7302684715

[美]德梅萃·P. 博塞克斯（Dimitri P. Bertsekas）

預覽內頁

出版商: 清華大學
出版日期: 2025-04-01
定價: $474
售價: 8.5 折 $403
語言: 簡體中文
ISBN: 7302684715
ISBN-13: 9787302684718
相關分類: Machine Learning

下單後立即進貨 (約4週~6週)

商品描述

第一章，從阿爾法零的卓越性能出發，深入解讀其背後著實不易的成長歷程，揭示其數學模型。第二章，從確定性和隨機動態規劃問題入手，介紹決策問題的數學模型。第三章，從抽象視角回顧紛繁復雜的強化學習算法，揭示值函數近似與滾動改進的重要作用。第四章，從經典的線性二次型**控制問題入手，分析從阿爾法零的成功中學到的經驗。第五章，分別從魯棒、自適應、模型預測控制等問題入手，分析值函數近似與滾動改進對算法性能的提升潛力。第六章，從離散優化的視角審視阿爾法零的成功經驗。第七章，總結全書。適合作為本領域研究者作為學術專著閱讀，也適合作為研究生和本科生作為參考書使用。

目錄大綱

Contents

1. AlphaZero, Off-Line Training, and On-Line Play

1.1. Off-Line Training and Policy Iteration P. 3

1.2. On-Line Play and Approximation in Value Space -

Truncated Rollout p. 6

1.3. The Lessons of AlphaZero p. 8

1.4. A New Conceptual Framework for Reinforcement Learning p. 11

1.5. Notes and Sources p. 14

2. Deterministic and Stochastic Dynamic Programming

2.1. Optimal Control Over an Infinite Horizon p. 20

2.2. Approximation in Value Space p. 25

2.3. Notes and Sources p. 30

3. An Abstract View of Reinforcement Learning

3.1. Bellman Operators p. 32

3.2. Approximation in Value Space and Newton's Method p. 39

3.3. Region of Stability p. 46

3.4. Policy Iteration, Rollout, and Newton's Method p. 50

3.5. How Sensitive is On-Line Play to the Off-Line

Training Process? p. 58

3.6. Why Not Just Train a Policy Network and Use it Without

On-Line Play? p. 60

3.7. Multiagent Problems and Multiagent Rollout p. 61

3.8. On-Line Simplified Policy Iteration p. 66

3.9. Exceptional Cases p. 72

3.10. Notes and Sources p. 79

4. The Linear Quadratic Case - Illustrations

4.1. Optimal Solution p. 82

4.2. Cost Functions of Stable Linear Policies p. 83

4.3. Value Iteration p. 86

vii

viii Contents

4.4. One-Step and Multistep Lookahead - Newton Step

Interpretations p. 86

4.5. Sensitivity Issues p. 91

4.6. Rollout and Policy Iteration p. 94

4.7. Truncated Rollout - Length of Lookahead Issues . . ? p. 97

4.8. Exceptional Behavior in Linear Quadratic Problems . ? p. 99

4.9. Notes and Sources p. 100

5. Adaptive and Model Predictive Control

5.1. Systems with Unknown Parameters - Robust and

PID Control p. 102

5.2. Approximation in Value Space, Rollout, and Adaptive

Control p. 105

5.3. Approximation in Value Space, Rollout, and Model

Predictive Control p. 109

5.4. Terminal Cost Approximation - Stability Issues . . . p. 112

5.5. Notes and Sources p. 118

6. Finite Horizon Deterministic Problems - Discrete

Optimization

6.1. Deterministic Discrete Spaces Finite Horizon Problems. p. 120

6.2. General Discrete Optimization Problems p. 125

6.3. Approximation in Value Space p. 128

6.4. Rollout Algorithms for Discrete Optimization . . . p. 132

6.5. Rollout and Approximation in Value Space with Multistep

Lookahead p. 149

6.5.1. Simplified Multistep Rollout - Double Rollout . . p. 150

6.5.2. Incremental Rollout for Multistep Approximation in

Value Space p. 153

6.6. Constrained Forms of Rollout Algorithms p. 159

6.7. Adaptive Control by Rollout with a POMDP Formulation p. 173

6.8. Rollout for Minimax Control p. 182

6.9. Small Stage Costs and Long Horizon - Continuous-Time

Rollout p. 190

6.10. Epilogue p. 197

Appendix A: Newton's Method and Error Bounds

A.1. Newton's Method for Differentiable Fixed

Point Problems p. 202

A.2. Newton's Method Without Differentiability of the

Hellman Operator p. 207

Contents ix

A.3. Local and Global Error Bounds for Approximation in

Value Space p. 210

A.4. Local and Global Error Bounds for Approximate

Policy Iteration p. 212

References p. 217

阿爾法零對最優模型預測自適應控制的啟示

[美]德梅萃·P. 博塞克斯（Dimitri P. Bertsekas）

商品描述

目錄大綱

類似商品