Deep Learning at Scale: At the Intersection of Hardware, Software, and Data

Mall, Suneeta

  • 出版商: O'Reilly
  • 出版日期: 2024-07-23
  • 售價: $2,740
  • 貴賓價: 9.5$2,603
  • 語言: 英文
  • 頁數: 448
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1098145283
  • ISBN-13: 9781098145286
  • 相關分類: DeepLearning
  • 立即出貨 (庫存=1)

商品描述

Bringing a deep-learning project into production at scale is quite challenging. To successfully scale your project, a foundational understanding of full stack deep learning, including the knowledge that lies at the intersection of hardware, software, data, and algorithms, is required.

This book illustrates complex concepts of full stack deep learning and reinforces them through hands-on exercises to arm you with tools and techniques to scale your project. A scaling effort is only beneficial when it's effective and efficient. To that end, this guide explains the intricate concepts and techniques that will help you scale effectively and efficiently.

You'll gain a thorough understanding of:

  • How data flows through the deep-learning network and the role the computation graphs play in building your model
  • How accelerated computing speeds up your training and how best you can utilize the resources at your disposal
  • How to train your model using distributed training paradigms, i.e., data, model, and pipeline parallelism
  • How to leverage PyTorch ecosystems in conjunction with NVIDIA libraries and Triton to scale your model training
  • Debugging, monitoring, and investigating the undesirable bottlenecks that slow down your model training
  • How to expedite the training lifecycle and streamline your feedback loop to iterate model development
  • A set of data tricks and techniques and how to apply them to scale your training model
  • How to select the right tools and techniques for your deep-learning project
  • Options for managing the compute infrastructure when running at scale

商品描述(中文翻譯)

將深度學習專案成功地擴展到大規模生產環境是相當具有挑戰性的。要成功擴展專案,需要對全棧深度學習有基礎的理解,包括硬體、軟體、數據和算法的交叉領域知識。本書通過實踐練習來闡述全棧深度學習的複雜概念,並提供工具和技術,幫助您擴展專案。只有在有效和高效的情況下,擴展工作才有益。為此,本指南解釋了複雜的概念和技術,幫助您實現有效和高效的擴展。您將全面了解以下內容:
- 數據如何在深度學習網絡中流動,計算圖在構建模型中的作用
- 加速計算如何加快訓練速度,以及如何最佳利用您手頭的資源
- 如何使用分佈式訓練範式(即數據、模型和管道的並行)訓練模型
- 如何結合PyTorch生態系統、NVIDIA庫和Triton來擴展模型訓練
- 調試、監控和調查減慢模型訓練速度的不良瓶頸
- 如何加快訓練週期,優化反饋循環,迭代模型開發
- 一組數據技巧和技術,以及如何應用它們來擴展訓練模型
- 如何為深度學習專案選擇合適的工具和技術
- 在大規模運行時管理計算基礎設施的選項