Modern Data Architectures with Python: A practical guide to building and deploying data pipelines, data warehouses, and data lakes with Python
暫譯: 使用 Python 的現代數據架構:構建和部署數據管道、數據倉庫和數據湖的實用指南

Lipp, Brian

  • 出版商: Packt Publishing
  • 出版日期: 2023-09-29
  • 售價: $2,030
  • 貴賓價: 9.5$1,929
  • 語言: 英文
  • 頁數: 318
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1801070490
  • ISBN-13: 9781801070492
  • 相關分類: Python程式語言
  • 海外代購書籍(需單獨結帳)

買這商品的人也買了...

相關主題

商品描述

Build scalable and reliable data ecosystems using Data Mesh, Databricks Spark, and Kafka

Key Features

  • Develop modern data skills used in emerging technologies
  • Learn pragmatic design methodologies such as Data Mesh and data lakehouses
  • Gain a deeper understanding of data governance
  • Purchase of the print or Kindle book includes a free PDF eBook

Book Description

Modern Data Architectures with Python will teach you how to seamlessly incorporate your machine learning and data science work streams into your open data platforms. You’ll learn how to take your data and create open lakehouses that work with any technology using tried-and-true techniques, including the medallion architecture and Delta Lake.

Starting with the fundamentals, this book will help you build pipelines on Databricks, an open data platform, using SQL and Python. You’ll gain an understanding of notebooks and applications written in Python using standard software engineering tools such as git, pre-commit, Jenkins, and Github. Next, you’ll delve into streaming and batch-based data processing using Apache Spark and Confluent Kafka. As you advance, you’ll learn how to deploy your resources using infrastructure as code and how to automate your workflows and code development. Since any data platform's ability to handle and work with AI and ML is a vital component, you’ll also explore the basics of ML and how to work with modern MLOps tooling. Finally, you’ll get hands-on experience with Apache Spark, one of the key data technologies in today’s market.

By the end of this book, you’ll have amassed a wealth of practical and theoretical knowledge to build, manage, orchestrate, and architect your data ecosystems.

What you will learn

  • Understand data patterns including delta architecture
  • Discover how to increase performance with Spark internals
  • Find out how to design critical data diagrams
  • Explore MLOps with tools such as AutoML and MLflow
  • Get to grips with building data products in a data mesh
  • Discover data governance and build confidence in your data
  • Introduce data visualizations and dashboards into your data practice

Who this book is for

This book is for developers, analytics engineers, and managers looking to further develop a data ecosystem within their organization. While they’re not prerequisites, basic knowledge of Python and prior experience with data will help you to read and follow along with the examples.

商品描述(中文翻譯)

建立可擴展且可靠的數據生態系統,使用 Data Mesh、Databricks Spark 和 Kafka

主要特點

- 發展在新興技術中使用的現代數據技能
- 學習實用的設計方法論,如 Data Mesh 和數據湖屋
- 深入了解數據治理
- 購買印刷版或 Kindle 書籍可獲得免費 PDF 電子書

書籍描述

《使用 Python 的現代數據架構》將教您如何將機器學習和數據科學工作流程無縫整合到開放數據平台中。您將學習如何利用經過驗證的技術,將數據轉化為可與任何技術協作的開放湖屋,包括獎牌架構和 Delta Lake。

本書從基礎開始,將幫助您使用 SQL 和 Python 在 Databricks(一個開放數據平台)上構建管道。您將了解使用標準軟體工程工具(如 git、pre-commit、Jenkins 和 Github)編寫的 Python 筆記本和應用程式。接下來,您將深入探討使用 Apache Spark 和 Confluent Kafka 的流式和批量數據處理。隨著進展,您將學習如何使用基礎設施即代碼來部署資源,以及如何自動化工作流程和代碼開發。由於任何數據平台處理和運用 AI 和 ML 的能力是至關重要的組成部分,您還將探索 ML 的基本概念以及如何使用現代 MLOps 工具。最後,您將獲得使用 Apache Spark 的實踐經驗,這是當今市場上關鍵的數據技術之一。

在本書結束時,您將積累大量實用和理論知識,以構建、管理、協調和設計您的數據生態系統。

您將學到的內容

- 理解數據模式,包括 delta 架構
- 探索如何通過 Spark 內部提高性能
- 瞭解如何設計關鍵數據圖表
- 探索使用 AutoML 和 MLflow 等工具的 MLOps
- 熟悉在數據網格中構建數據產品
- 探索數據治理並增強對數據的信心
- 將數據可視化和儀表板引入您的數據實踐

本書適合誰

本書適合開發人員、分析工程師和希望在其組織內進一步發展數據生態系統的管理者。雖然不是必備條件,但對 Python 的基本知識和先前的數據經驗將有助於您閱讀和跟隨示例。

目錄大綱

  1. Modern Data Processing Architectures
  2. Basics of Data Analytics Engineering
  3. Cloud Storage and Processing Concepts
  4. Python Batch and Stream Processing with Spark
  5. Streaming Data with Kafka
  6. Python MLOps
  7. Python and SQL based Visualizations
  8. Integrating CI into your workflow
  9. Data Orchestration
  10. Data Governance
  11. Introduction to Saturn Insurance, Deploying CI and ELT
  12. Data Governance and Dashboards

目錄大綱(中文翻譯)


  1. Modern Data Processing Architectures

  2. Basics of Data Analytics Engineering

  3. Cloud Storage and Processing Concepts

  4. Python Batch and Stream Processing with Spark

  5. Streaming Data with Kafka

  6. Python MLOps

  7. Python and SQL based Visualizations

  8. Integrating CI into your workflow

  9. Data Orchestration

  10. Data Governance

  11. Introduction to Saturn Insurance, Deploying CI and ELT

  12. Data Governance and Dashboards