Distributed Data Systems with Azure Databricks: Create, deploy, and manage enterprise data pipelines
暫譯: 使用 Azure Databricks 的分散式資料系統:建立、部署及管理企業資料管道

Palacio, Alan Bernardo

  • 出版商: Packt Publishing
  • 出版日期: 2021-05-25
  • 售價: $1,930
  • 貴賓價: 9.5$1,834
  • 語言: 英文
  • 頁數: 414
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 183864721X
  • ISBN-13: 9781838647216
  • 相關分類: Microsoft Azure
  • 海外代購書籍(需單獨結帳)

相關主題

商品描述

Quickly build and deploy massive data pipelines and improve productivity using Azure Databricks


Key Features:

  • Get to grips with the distributed training and deployment of machine learning and deep learning models
  • Learn how ETLs are integrated with Azure Data Factory and Delta Lake
  • Explore deep learning and machine learning models in a distributed computing infrastructure


Book Description:

Microsoft Azure Databricks helps you to harness the power of distributed computing and apply it to create robust data pipelines, along with training and deploying machine learning and deep learning models. Databricks' advanced features enable developers to process, transform, and explore data. Distributed Data Systems with Azure Databricks will help you to put your knowledge of Databricks to work to create big data pipelines.


The book provides a hands-on approach to implementing Azure Databricks and its associated methodologies that will make you productive in no time. Complete with detailed explanations of essential concepts, practical examples, and self-assessment questions, you'll begin with a quick introduction to Databricks core functionalities, before performing distributed model training and inference using TensorFlow and Spark MLlib. As you advance, you'll explore MLflow Model Serving on Azure Databricks and implement distributed training pipelines using HorovodRunner in Databricks.


Finally, you'll discover how to transform, use, and obtain insights from massive amounts of data to train predictive models and create entire fully working data pipelines. By the end of this MS Azure book, you'll have gained a solid understanding of how to work with Databricks to create and manage an entire big data pipeline.


What You Will Learn:

  • Create ETLs for big data in Azure Databricks
  • Train, manage, and deploy machine learning and deep learning models
  • Integrate Databricks with Azure Data Factory for extract, transform, load (ETL) pipeline creation
  • Discover how to use Horovod for distributed deep learning
  • Find out how to use Delta Engine to query and process data from Delta Lake
  • Understand how to use Data Factory in combination with Databricks
  • Use Structured Streaming in a production-like environment


Who this book is for:

This book is for software engineers, machine learning engineers, data scientists, and data engineers who are new to Azure Databricks and want to build high-quality data pipelines without worrying about infrastructure. Knowledge of Azure Databricks basics is required to learn the concepts covered in this book more effectively. A basic understanding of machine learning concepts and beginner-level Python programming knowledge is also recommended.

商品描述(中文翻譯)

快速建立和部署大型數據管道,並利用 Azure Databricks 提高生產力

主要特點:
- 理解機器學習和深度學習模型的分散式訓練和部署
- 學習 ETL 如何與 Azure Data Factory 和 Delta Lake 整合
- 探索在分散式計算基礎架構中的深度學習和機器學習模型

書籍描述:
Microsoft Azure Databricks 幫助您利用分散式計算的力量,並將其應用於創建穩健的數據管道,以及訓練和部署機器學習和深度學習模型。Databricks 的先進功能使開發人員能夠處理、轉換和探索數據。《使用 Azure Databricks 的分散式數據系統》將幫助您將對 Databricks 的知識付諸實踐,創建大數據管道。

本書提供了一種實踐方法來實施 Azure Databricks 及其相關方法論,讓您迅速提高生產力。書中詳細解釋了基本概念、實用範例和自我評估問題,您將從快速介紹 Databricks 的核心功能開始,然後使用 TensorFlow 和 Spark MLlib 進行分散式模型訓練和推斷。隨著進展,您將探索 Azure Databricks 上的 MLflow 模型服務,並在 Databricks 中使用 HorovodRunner 實施分散式訓練管道。

最後,您將發現如何轉換、使用和從大量數據中獲取見解,以訓練預測模型並創建完整的運作數據管道。在這本 Microsoft Azure 書籍結束時,您將對如何使用 Databricks 創建和管理整個大數據管道有一個堅實的理解。

您將學到的內容:
- 在 Azure Databricks 中創建大數據的 ETL
- 訓練、管理和部署機器學習和深度學習模型
- 將 Databricks 與 Azure Data Factory 整合以創建提取、轉換、加載 (ETL) 管道
- 探索如何使用 Horovod 進行分散式深度學習
- 瞭解如何使用 Delta Engine 查詢和處理來自 Delta Lake 的數據
- 理解如何將 Data Factory 與 Databricks 結合使用
- 在類生產環境中使用結構化流式處理

本書適合對象:
本書適合新接觸 Azure Databricks 的軟體工程師、機器學習工程師、數據科學家和數據工程師,並希望在不擔心基礎設施的情況下構建高品質的數據管道。為了更有效地學習本書所涵蓋的概念,建議具備 Azure Databricks 基礎知識。此外,對機器學習概念的基本理解和初級 Python 編程知識也是推薦的。