Deep Reinforcement Learning with Python: RLHF for Chatbots and Large Language Models
暫譯: 使用 Python 的深度強化學習：針對聊天機器人和大型語言模型的 RLHF

Name: Deep Reinforcement Learning with Python: RLHF for Chatbots and Large Language Models
Price: 2043 TWD
Availability: InStock
Author: Sanghi, Nimish
ISBN: 9798868802720

Sanghi, Nimish

出版商: Apress
出版日期: 2024-07-15
售價: $2,150
貴賓價: 9.5 折 $2,043
語言: 英文
頁數: 634
裝訂: Quality Paper - also called trade paper
ISBN: 9798868802720
ISBN-13: 9798868802720
相關分類: Chatbot、LangChain、Python、程式語言、Reinforcement、DeepLearning

立即出貨 (庫存=1)

買這商品的人也買了...

~~$1,690~~ $1,606

Learn Robotics Programming : Build and control AI-enabled autonomous robots using the Raspberry Pi and Python, 2/e (Paperback)
~~$3,390~~ $3,221

Transformers for Natural Language Processing : Build, train, and fine-tune deep neural network architectures for NLP with Python, PyTorch, 2/e (Paperback)
~~$380~~ $300

ChatGPT 一本搞定：讓 AI 成為你的工作好幫手，徹底打敗拒絕新科技的人
~~$780~~ $616

Python + ChatGPT 零基礎 + 高效率學程式設計與運算思維, 3/e
$311

你好,ChatGPT AI ChatGPT GPT-3 GPT-4
~~$490~~ $387

Python X ChatGPT：零基礎 AI 聊天用流程圖學 Python 程式設計
~~$2,679~~ $2,538

Generative Deep Learning: Teaching Machines to Paint, Write, Compose, and Play, 2/e (Paperback)
~~$630~~ $498

AI 繪圖夢工廠：Midjourney、Stable Diffusion、Leonardo. ai × ChatGPT 超應用神技
~~$680~~ $537

ChatGPT 4 萬用手冊 2023 秋季號：超強外掛、Prompt、LineBot、OpenAI API、Midjourney、Stable Diffusion、Leonardo.ai
~~$1,480~~ $1,450

Artificial Intelligence: Foundations of Computational Agents, 3/e (Hardcover)
~~$1,501~~ $1,422

The Complete Obsolete Guide to Generative AI

商品描述

Gain a theoretical understanding to the most popular libraries in deep reinforcement learning (deep RL). This new edition focuses on the latest advances in deep RL using a learn-by-coding approach, allowing readers to assimilate and replicate the latest research in this field.

New agent environments ranging from games, and robotics to finance are explained to help you try different ways to apply reinforcement learning. A chapter on multi-agent reinforcement learning covers how multiple agents compete, while another chapter focuses on the widely used deep RL algorithm, proximal policy optimization (PPO). You'll see how reinforcement learning with human feedback (RLHF) has been used by chatbots, built using Large Language Models, e.g. ChatGPT to improve conversational capabilities.

You'll also review the steps for using the code on multiple cloud systems and deploying models on platforms such as Hugging Face Hub. The code is in Jupyter Notebook, which canbe run on Google Colab, and other similar deep learning cloud platforms, allowing you to tailor the code to your own needs.

Whether it's for applications in gaming, robotics, or Generative AI, Deep Reinforcement Learning with Python will help keep you ahead of the curve.

What You'll Learn

Explore Python-based RL libraries, including StableBaselines3 and CleanRL
Work with diverse RL environments like Gymnasium, Pybullet, and Unity ML
Understand instruction finetuning of Large Language Models using RLHF and PPO
Study training and optimization techniques using HuggingFace, Weights and Biases, and Optuna

Who This Book Is For

Software engineers and machine learning developers eager to sharpen their understanding of deep RL and acquire practical skills in implementing RL algorithms fromscratch.

商品描述(中文翻譯)

獲得對深度強化學習（deep RL）中最受歡迎的庫的理論理解。本新版本專注於使用編碼學習方法的深度強化學習最新進展，使讀者能夠吸收並複製該領域的最新研究。

新代理環境涵蓋從遊戲、機器人到金融的各種應用，幫助您嘗試不同的強化學習應用方式。一章關於多代理強化學習的內容探討了多個代理之間的競爭，而另一章則專注於廣泛使用的深度強化學習算法，近端策略優化（proximal policy optimization, PPO）。您將看到如何利用人類反饋的強化學習（reinforcement learning with human feedback, RLHF）來改善聊天機器人的對話能力，這些聊天機器人是基於大型語言模型（Large Language Models）構建的，例如 ChatGPT。

您還將回顧在多個雲系統上使用代碼和在 Hugging Face Hub 等平台上部署模型的步驟。代碼使用 Jupyter Notebook 編寫，可以在 Google Colab 和其他類似的深度學習雲平台上運行，讓您能夠根據自己的需求調整代碼。

無論是應用於遊戲、機器人還是生成式人工智慧，使用 Python 的深度強化學習 將幫助您保持領先。

您將學到什麼

探索基於 Python 的強化學習庫，包括 StableBaselines3 和 CleanRL

使用多樣的強化學習環境，如 Gymnasium、Pybullet 和 Unity ML

理解使用 RLHF 和 PPO 的大型語言模型的指令微調

研究使用 HuggingFace、Weights and Biases 和 Optuna 的訓練和優化技術

本書適合誰

希望加強對深度強化學習理解並獲得從零開始實施強化學習算法的實用技能的軟體工程師和機器學習開發者。

作者簡介

Nimish is a seasoned entrepreneur and an angel investor, with a rich portfolio of tech ventures in SaaS Software and Automation with AI across India, the US and Singapore. He has over 30 years of work experience. Nimish ventured into entrepreneurship in 2006 after holding leadership roles at global corporations like PwC, IBM, and Oracle.

Nimish holds an MBA from Indian Institute of Management, Ahmedabad, India (IIMA), and a Bachelor of Technology in Electrical Engineering from Indian Institute of Technology, Kanpur, India (IITK).