Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process
暫譯: 使用 Python 食譜進行資料攝取:實用指南,涵蓋資料攝取過程中的攝取、監控及錯誤識別

Esppenchutz, Gláucia

  • 出版商: Packt Publishing
  • 出版日期: 2023-05-31
  • 售價: $1,710
  • 貴賓價: 9.5$1,625
  • 語言: 英文
  • 頁數: 414
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 183763260X
  • ISBN-13: 9781837632602
  • 相關分類: Python程式語言
  • 海外代購書籍(需單獨結帳)

相關主題

商品描述

Deploy your data ingestion pipeline, orchestrate, and monitor efficiently to prevent loss of data and quality

Purchase of the print or Kindle book includes a free PDF eBook


Key Features:

  • Harness best practices to create a Python and PySpark data ingestion pipeline
  • Seamlessly automate and orchestrate your data pipelines using Apache Airflow
  • Build a monitoring framework by integrating the concept of data observability into your pipelines


Book Description:

Data Ingestion with Python Cookbook offers a practical approach to designing and implementing data ingestion pipelines. It presents real-world examples with the most widely recognized open source tools on the market to answer commonly asked questions and overcome challenges.

You'll be introduced to designing and working with or without data schemas, as well as creating monitored pipelines with Airflow and data observability principles, all while following industry best practices. The book also addresses challenges associated with reading different data sources and data formats. As you progress through the book, you'll gain a broader understanding of error logging best practices, troubleshooting techniques, data orchestration, monitoring, and storing logs for further consultation.

By the end of the book, you'll have a fully automated set that enables you to start ingesting and monitoring your data pipeline effortlessly, facilitating seamless integration with subsequent stages of the ETL process.


What You Will Learn:

  • Implement data observability using monitoring tools
  • Automate your data ingestion pipeline
  • Read analytical and partitioned data, whether schema or non-schema based
  • Debug and prevent data loss through efficient data monitoring and logging
  • Establish data access policies using a data governance framework
  • Construct a data orchestration framework to improve data quality


Who this book is for:

This book is for data engineers and data enthusiasts seeking a comprehensive understanding of the data ingestion process using popular tools in the open source community. For more advanced learners, this book takes on the theoretical pillars of data governance while providing practical examples of real-world scenarios commonly encountered by data engineers.

商品描述(中文翻譯)

有效部署您的數據攝取管道,協調並監控以防止數據和質量的損失

購買印刷版或 Kindle 版書籍可獲得免費 PDF 電子書

主要特點:


  • 利用最佳實踐創建 Python 和 PySpark 數據攝取管道

  • 使用 Apache Airflow 無縫自動化和協調您的數據管道

  • 通過將數據可觀察性概念整合到您的管道中來構建監控框架

書籍描述:
《使用 Python 的數據攝取食譜》提供了一種設計和實施數據攝取管道的實用方法。它展示了使用市場上最廣泛認可的開源工具的真實案例,以回答常見問題並克服挑戰。

您將學習如何設計和處理有無數據架構的情況,以及如何使用 Airflow 和數據可觀察性原則創建受監控的管道,同時遵循行業最佳實踐。本書還解決了與讀取不同數據來源和數據格式相關的挑戰。隨著您逐步深入本書,您將對錯誤日誌最佳實踐、故障排除技術、數據協調、監控以及存儲日誌以供進一步查詢有更廣泛的理解。

在本書結束時,您將擁有一套完全自動化的系統,使您能夠輕鬆開始攝取和監控您的數據管道,促進與 ETL 過程後續階段的無縫整合。

您將學到的內容:


  • 使用監控工具實施數據可觀察性

  • 自動化您的數據攝取管道

  • 讀取分析性和分區數據,無論是基於架構還是非架構

  • 通過高效的數據監控和日誌記錄來調試和防止數據損失

  • 使用數據治理框架建立數據訪問政策

  • 構建數據協調框架以改善數據質量

本書適合誰:
本書適合數據工程師和數據愛好者,尋求對使用開源社區中流行工具的數據攝取過程的全面理解。對於更高級的學習者,本書探討了數據治理的理論基礎,同時提供了數據工程師常遇到的真實場景的實用範例。