Reinforcement Learning and Optimal Control
暫譯: 強化學習與最佳控制
Dimitri Bertsekas
- 出版商: Athena Scientific
- 出版日期: 2019-07-15
- 售價: $3,610
- 貴賓價: 9.9 折 $3,574
- 語言: 英文
- 頁數: 388
- 裝訂: Hardcover
- ISBN: 1886529396
- ISBN-13: 9781886529397
-
相關分類:
Reinforcement、DeepLearning
-
相關翻譯:
強化學習與最優控制 (簡中版)
相關主題
商品描述
This book considers large and challenging multistage decision problems, which can be solved in principle by dynamic programming, but their exact solution is computationally intractable. It can be used as a textbook or for self-study in conjunction with instructional videos and slides, and other supporting material, which are available from the author's website. The book discusses solution methods that rely on approximations to produce suboptimal policies with adequate performance. These methods are known by several essentially equivalent names: reinforcement learning, approximate dynamic programming, and neuro-dynamic programming. They underlie, among others, the recent impressive successes of self-learning in the context of games such as chess and Go. One of the aims of the book is to explore the common boundary between artificial intelligence and optimal control, and to form a bridge that is accessible by workers with background in either field. Another aim is to organize coherently the broad mosaic of methods that have proved successful in practice while having a solid theoretical and/or logical foundation. This may help researchers and practitioners to find their way through the maze of competing ideas that constitute the current state of the art. The mathematical style of this book is somewhat different than other books by the same author. While we provide a rigorous, albeit short, mathematical account of the theory of finite and infinite horizon dynamic programming, and some fundamental approximation methods, we rely more on intuitive explanations and less on proof-based insights. We also illustrate the methodology with many example algorithms and applications.
商品描述(中文翻譯)
本書探討大型且具挑戰性的多階段決策問題,這些問題原則上可以透過動態規劃來解決,但其精確解法在計算上是不可行的。本書可作為教科書或自學材料,並可搭配作者網站上提供的教學影片、簡報及其他輔助材料使用。本書討論依賴近似解法來產生具有足夠效能的次最佳政策的解決方法。這些方法有幾個本質上等價的名稱:強化學習(reinforcement learning)、近似動態規劃(approximate dynamic programming)和神經動態規劃(neuro-dynamic programming)。這些方法在最近自學方面的顯著成功中發揮了重要作用,特別是在棋類遊戲如國際象棋和圍棋的背景下。本書的其中一個目標是探索人工智慧與最佳控制之間的共同邊界,並形成一座可供具備任一領域背景的工作者通行的橋樑。另一個目標是有系統地組織那些在實踐中證明成功的廣泛方法,並且這些方法具有堅實的理論和/或邏輯基礎。這可能有助於研究人員和實務工作者在當前技術狀態中,穿越競爭理念的迷宮。本書的數學風格與同一作者的其他書籍略有不同。雖然我們提供了對有限和無限視野動態規劃理論的嚴謹但簡短的數學說明,以及一些基本的近似方法,但我們更依賴直觀的解釋,而較少依賴基於證明的見解。我們還通過許多示例算法和應用來說明這些方法論。
作者簡介
Dimitri Bertsekas is McAffee Professor of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology, and a member of the National Academy of Engineering. He has researched a broad variety of subjects from optimization theory, control theory, parallel and distributed computation, systems analysis, and data communication networks. He has written numerous papers in each of these areas, and he has authored or coauthored seventeen textbooks. Professor Bertsekas was awarded the INFORMS 1997 Prize for Research Excellence in the Interface Between Operations Research and Computer Science for his book "Neuro-Dynamic Programming", the 2001 ACC John R. Ragazzini Education Award, the 2009 INFORMS Expository Writing Award, the 2014 ACC Richard E. Bellman Control Heritage Award for "contributions to the foundations of deterministic and stochastic optimization-based methods in systems and control," the 2014 Khachiyan Prize for Life-Time Accomplishments in Optimization, and the 2015 George B. Dantzig Prize. In 2018, he was awarded, jointly with his coauthor John Tsitsiklis, the INFORMS John von Neumann Theory Prize, for the contributions of the research monographs "Parallel and Distributed Computation" and "Neuro-Dynamic Programming". In 2001, he was elected to the United States National Academy of Engineering for "pioneering contributions to fundamental research, practice and education of optimization/control theory".
作者簡介(中文翻譯)
Dimitri Bertsekas 是麻省理工學院電機工程與計算機科學的 McAffee 教授,也是美國國家工程院的成員。他的研究涵蓋了優化理論、控制理論、平行與分散計算、系統分析以及數據通信網絡等多個領域。他在這些領域發表了大量論文,並且著有或合著了十七本教科書。Bertsekas 教授因其著作《Neuro-Dynamic Programming》獲得了 INFORMS 1997 年運營研究與計算機科學交界處研究卓越獎,2001 年獲得 ACC John R. Ragazzini 教育獎,2009 年獲得 INFORMS 解釋性寫作獎,2014 年獲得 ACC Richard E. Bellman 控制遺產獎,以表彰他在系統與控制中對確定性和隨機優化方法基礎的貢獻,2014 年獲得 Khachiyan 獎以表彰其在優化領域的終身成就,以及 2015 年獲得 George B. Dantzig 獎。2018 年,他與合著者 John Tsitsiklis 一同獲得 INFORMS John von Neumann 理論獎,以表彰其研究專著《Parallel and Distributed Computation》和《Neuro-Dynamic Programming》的貢獻。2001 年,他因「對優化/控制理論的基礎研究、實踐和教育的開創性貢獻」當選為美國國家工程院院士。