Deep Reinforcement Learning with Python: RLHF for Chatbots and Large Language Models
暫譯: 使用 Python 的深度強化學習:針對聊天機器人和大型語言模型的 RLHF

Sanghi, Nimish

買這商品的人也買了...

相關主題

商品描述

Gain a theoretical understanding to the most popular libraries in deep reinforcement learning (deep RL). This new edition focuses on the latest advances in deep RL using a learn-by-coding approach, allowing readers to assimilate and replicate the latest research in this field.

New agent environments ranging from games, and robotics to finance are explained to help you try different ways to apply reinforcement learning. A chapter on multi-agent reinforcement learning covers how multiple agents compete, while another chapter focuses on the widely used deep RL algorithm, proximal policy optimization (PPO). You'll see how reinforcement learning with human feedback (RLHF) has been used by chatbots, built using Large Language Models, e.g. ChatGPT to improve conversational capabilities.

You'll also review the steps for using the code on multiple cloud systems and deploying models on platforms such as Hugging Face Hub. The code is in Jupyter Notebook, which canbe run on Google Colab, and other similar deep learning cloud platforms, allowing you to tailor the code to your own needs.

Whether it's for applications in gaming, robotics, or Generative AI, Deep Reinforcement Learning with Python will help keep you ahead of the curve.


What You'll Learn

 

 

 

  • Explore Python-based RL libraries, including StableBaselines3 and CleanRL
  • Work with diverse RL environments like Gymnasium, Pybullet, and Unity ML
  • Understand instruction finetuning of Large Language Models using RLHF and PPO
  • Study training and optimization techniques using HuggingFace, Weights and Biases, and Optuna

 

Who This Book Is For

Software engineers and machine learning developers eager to sharpen their understanding of deep RL and acquire practical skills in implementing RL algorithms fromscratch.

 

商品描述(中文翻譯)

獲得對深度強化學習(deep RL)中最受歡迎的庫的理論理解。本新版本專注於使用編碼學習方法的深度強化學習最新進展,使讀者能夠吸收並複製該領域的最新研究。

新代理環境涵蓋從遊戲、機器人到金融的各種應用,幫助您嘗試不同的強化學習應用方式。一章關於多代理強化學習的內容探討了多個代理之間的競爭,而另一章則專注於廣泛使用的深度強化學習算法,近端策略優化(proximal policy optimization, PPO)。您將看到如何利用人類反饋的強化學習(reinforcement learning with human feedback, RLHF)來改善聊天機器人的對話能力,這些聊天機器人是基於大型語言模型(Large Language Models)構建的,例如 ChatGPT。

您還將回顧在多個雲系統上使用代碼和在 Hugging Face Hub 等平台上部署模型的步驟。代碼使用 Jupyter Notebook 編寫,可以在 Google Colab 和其他類似的深度學習雲平台上運行,讓您能夠根據自己的需求調整代碼。

無論是應用於遊戲、機器人還是生成式人工智慧,使用 Python 的深度強化學習 將幫助您保持領先。



您將學到什麼


  • 探索基於 Python 的強化學習庫,包括 StableBaselines3 和 CleanRL

  • 使用多樣的強化學習環境,如 Gymnasium、Pybullet 和 Unity ML

  • 理解使用 RLHF 和 PPO 的大型語言模型的指令微調

  • 研究使用 HuggingFace、Weights and Biases 和 Optuna 的訓練和優化技術

本書適合誰

希望加強對深度強化學習理解並獲得從零開始實施強化學習算法的實用技能的軟體工程師和機器學習開發者。

作者簡介

Nimish is a seasoned entrepreneur and an angel investor, with a rich portfolio of tech ventures in SaaS Software and Automation with AI across India, the US and Singapore. He has over 30 years of work experience. Nimish ventured into entrepreneurship in 2006 after holding leadership roles at global corporations like PwC, IBM, and Oracle.

 

Nimish holds an MBA from Indian Institute of Management, Ahmedabad, India (IIMA), and a Bachelor of Technology in Electrical Engineering from Indian Institute of Technology, Kanpur, India (IITK). ​

 

作者簡介(中文翻譯)

Nimish 是一位經驗豐富的企業家和天使投資人,擁有在印度、美國和新加坡的 SaaS 軟體和自動化 AI 技術創業的豐富投資組合。他擁有超過 30 年的工作經驗。Nimish 在 2006 年開始創業,此前曾在全球知名企業如 PwC、IBM 和 Oracle 擔任領導職位。

Nimish 擁有印度艾哈邁達巴德管理學院 (IIMA) 的 MBA 學位,以及印度坎普爾科技學院 (IITK) 的電機工程學士學位。