Delta Lake: Up and Running: Modern Data Lakehouse Architectures with Delta Lake (Paperback)
暫譯: Delta Lake:運行中:使用 Delta Lake 的現代數據湖倉架構(平裝本)

Haelen, Bennie, Davis, Dan

  • 出版商: O'Reilly
  • 出版日期: 2023-11-21
  • 定價: $2,360
  • 售價: 8.8$2,077
  • 語言: 英文
  • 頁數: 264
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1098139720
  • ISBN-13: 9781098139728
  • 立即出貨 (庫存 < 3)

買這商品的人也買了...

相關主題

商品描述

With the surge in big data and AI, organizations can rapidly create data products. However, the effectiveness of their analytics and machine learning models depends on the data's quality. Delta Lake's open source format offers a robust lakehouse framework over platforms like Amazon S3, ADLS, and GCS.

This practical book shows data engineers, data scientists, and data analysts how to get Delta Lake and its features up and running. The ultimate goal of building data pipelines and applications is to gain insights from data. You'll understand how your storage solution choice determines the robustness and performance of the data pipeline, from raw data to insights.

You'll learn how to:

  • Use modern data management and data engineering techniques
  • Understand how ACID transactions bring reliability to data lakes at scale
  • Run streaming and batch jobs against your data lake concurrently
  • Execute update, delete, and merge commands against your data lake
  • Use time travel to roll back and examine previous data versions
  • Build a streaming data quality pipeline following the medallion architecture

商品描述(中文翻譯)

隨著大數據和人工智慧的興起,組織能夠快速創建數據產品。然而,它們的分析和機器學習模型的有效性取決於數據的質量。Delta Lake 的開源格式在 Amazon S3、ADLS 和 GCS 等平台上提供了一個強大的湖倉框架。

這本實用的書籍向數據工程師、數據科學家和數據分析師展示如何啟用 Delta Lake 及其功能。建立數據管道和應用程序的最終目標是從數據中獲取洞察。您將了解存儲解決方案的選擇如何決定數據管道的穩健性和性能,從原始數據到洞察。

您將學習如何:

- 使用現代數據管理和數據工程技術
- 理解 ACID 交易如何在大規模數據湖中帶來可靠性
- 同時對數據湖運行流式和批量作業
- 對數據湖執行更新、刪除和合併命令
- 使用時間旅行回滾並檢查先前的數據版本
- 根據獎牌架構構建流式數據質量管道

作者簡介

Bennie is a principal architect with Insight Digital Innovation-a Microsoft and Databricks partner. As Principal architect with Insight, Bennie's primary focus areas are Modern Data Warehousing, Machine learning, AI, and IoT on various commercial cloud platforms. Bennie has overseen many Data + AI projects in different application domains such as health care, the public sector, oil & gas, and financial applications. Bennie has architected and delivered real time streaming Data Lakehouse applications with Databricks, Spark Structured Streaming, Delta Lake, and Microsoft Power BI for various application domains. Bennie brings a wealth of practical experience in implementing secure, enterprise-scale Data Lakehouse-based solutions to support business intelligence, data science and machine learning. Bennie has also been a frequent speaker at Databricks events at Microsoft Technology Centers around the country, and was a speaker at the Data+AI 2021 summit.

Dan Davis is a Cloud Data Architect with a decade of experience delivering analytic insights and business value from data. Using modern tools and technologies, Dan specializes in designing and delivering data platforms, frameworks, and process’ to support data integration and analytics for on-premises, hybrid, and cloud environments on an enterprise scale.

作者簡介(中文翻譯)

Bennie 是 Insight Digital Innovation 的首席架構師,該公司是 Microsoft 和 Databricks 的合作夥伴。作為 Insight 的首席架構師,Bennie 的主要專注領域包括現代數據倉儲、機器學習、人工智慧 (AI) 和物聯網 (IoT),並在各種商業雲平台上進行工作。Bennie 監督了許多在不同應用領域(如醫療保健、公共部門、石油與天然氣以及金融應用)的數據 + AI 項目。Bennie 設計並交付了基於 Databricks、Spark Structured Streaming、Delta Lake 和 Microsoft Power BI 的實時流式數據湖屋應用,涵蓋多個應用領域。Bennie 擁有豐富的實踐經驗,能夠實施安全的企業級數據湖屋解決方案,以支持商業智慧、數據科學和機器學習。Bennie 也經常在全國各地的 Microsoft 技術中心的 Databricks 活動中擔任演講者,並曾在 Data+AI 2021 峰會上發表演講。

Dan Davis 是一位雲數據架構師,擁有十年的經驗,專注於從數據中提供分析洞察和商業價值。Dan 專精於使用現代工具和技術設計和交付數據平台、框架和流程,以支持企業級的數據整合和分析,適用於本地、混合和雲環境。