An Invitation to Deep Reinforcement Learning
暫譯: 深度強化學習入門
Jaeger, Bernhard, Geiger, Andreas
- 出版商: Now Publishers
- 出版日期: 2025-01-02
- 售價: $2,680
- 貴賓價: 9.5 折 $2,546
- 語言: 英文
- 頁數: 96
- 裝訂: Quality Paper - also called trade paper
- ISBN: 1638284407
- ISBN-13: 9781638284406
-
相關分類:
Reinforcement、DeepLearning
海外代購書籍(需單獨結帳)
相關主題
商品描述
Training a deep neural network to maximize a target objective has become the standard recipe for successful machine learning over the last decade. These networks can be optimized with supervised learning if the target objective is differentiable. However, this is not the case for many interesting problems. Common objectives like intersection over union (IoU) and bilingual evaluation understudy (BLEU) scores or rewards cannot be optimized with supervised learning. A common workaround is to define differentiable surrogate losses, leading to suboptimal solutions with respect to the actual objective. Reinforcement learning (RL) has emerged as a promising alternative for optimizing deep neural networks to maximize non-differentiable objectives in recent years. Examples include aligning large language models via human feedback, code generation, object detection, or control problems. This makes RL techniques relevant to the larger machine learning audience. The subject is, however, time-intensive to approach due to the large range of methods, as well as the often highly theoretical presentation. This monograph takes an alternative approach that is different from classic RL textbooks. Rather than focusing on tabular problems, RL as a generalization of supervised learning is introduced, which is first applied to non-differentiable objectives and later to temporal problems. Assuming only basic knowledge of supervised learning, the reader will be able to understand state-of-the-art deep RL algorithms like proximal policy optimization (PPO) after reading this monograph.
商品描述(中文翻譯)
訓練深度神經網絡以最大化目標目標已成為過去十年成功機器學習的標準方法。如果目標目標是可微分的,這些網絡可以通過監督學習進行優化。然而,對於許多有趣的問題,情況並非如此。像交集與聯合比(intersection over union, IoU)和雙語評估指標(bilingual evaluation understudy, BLEU)分數或獎勵等常見目標無法通過監督學習進行優化。一種常見的解決方法是定義可微分的替代損失,這導致相對於實際目標的次優解決方案。強化學習(Reinforcement Learning, RL)在最近幾年中已成為優化深度神經網絡以最大化非可微分目標的有前景的替代方案。例子包括通過人類反饋對大型語言模型進行對齊、代碼生成、物體檢測或控制問題。這使得RL技術對更廣泛的機器學習受眾具有相關性。然而,由於方法範圍廣泛以及通常高度理論化的呈現,這一主題的學習過程相當耗時。本專著採取了一種不同於經典RL教科書的替代方法。它不專注於表格問題,而是將RL作為監督學習的一種概括,首先應用於非可微分目標,然後再應用於時間問題。在假設僅具備基本監督學習知識的情況下,讀者在閱讀本專著後將能夠理解最先進的深度RL算法,如近端策略優化(Proximal Policy Optimization, PPO)。