Scaling Machine Learning with Spark: Distributed ML with Mllib, Tensorflow, and Pytorch
暫譯: 使用 Spark 擴展機器學習:結合 Mllib、Tensorflow 和 Pytorch 的分散式 ML
Polak, Adi
- 出版商: O'Reilly
- 出版日期: 2023-04-11
- 定價: $2,700
- 售價: 8.8 折 $2,376
- 語言: 英文
- 頁數: 291
- 裝訂: Quality Paper - also called trade paper
- ISBN: 1098106822
- ISBN-13: 9781098106829
-
相關分類:
DeepLearning、Spark、TensorFlow、Machine Learning
立即出貨 (庫存=1)
相關主題
商品描述
Get up to speed on Apache Spark, the popular engine for large-scale data processing, including machine learning and analytics. If you're looking to expand your skill set or advance your career in scalable machine learning with MLlib, distributed PyTorch, and distributed TensorFlow, this practical guide is for you. Using Spark as your main data processing platform, you'll discover several open source technologies designed and built for enriching Spark's ML capabilities.
Scaling Machine Learning with Spark examines various technologies for building end-to-end distributed ML workflows based on the Apache Spark ecosystem with Spark MLlib, MLFlow, TensorFlow, PyTorch, and Petastorm. This book shows you when to use each technology and why. If you're a data scientist working with machine learning, you'll learn how to:
- Build practical distributed machine learning workflows, including feature engineering and data formats
- Extend deep learning functionalities beyond Spark by bridging into distributed TensorFlow and PyTorch
- Manage your machine learning experiment lifecycle with MLFlow
- Use Petastorm as a storage layer for bridging data from Spark into TensorFlow and PyTorch
- Use machine learning terminology to understand distribution strategies
商品描述(中文翻譯)
獲得 Apache Spark 的最新資訊,這是用於大規模數據處理的熱門引擎,包括機器學習和分析。如果您希望擴展技能或在可擴展的機器學習領域(使用 MLlib、分散式 PyTorch 和分散式 TensorFlow)推進您的職業生涯,這本實用指南適合您。使用 Spark 作為主要數據處理平台,您將發現幾種旨在增強 Spark 的機器學習能力的開源技術。
《使用 Spark 擴展機器學習》探討了基於 Apache Spark 生態系統的端到端分散式機器學習工作流程的各種技術,涵蓋 Spark MLlib、MLFlow、TensorFlow、PyTorch 和 Petastorm。本書將告訴您何時使用每種技術以及原因。如果您是一名從事機器學習的數據科學家,您將學會如何:
- 建立實用的分散式機器學習工作流程,包括特徵工程和數據格式
- 通過連接到分散式 TensorFlow 和 PyTorch 擴展深度學習功能
- 使用 MLFlow 管理機器學習實驗的生命週期
- 使用 Petastorm 作為存儲層,將數據從 Spark 轉移到 TensorFlow 和 PyTorch
- 使用機器學習術語理解分佈策略