Apache Airflow Best Practices: A practical guide to orchestrating data workflow with Apache Airflow
暫譯: Apache Airflow 最佳實踐:使用 Apache Airflow 組織數據工作流程的實用指南
Intorf, Dylan, Storey, Dylan, Doorn, Kendrick Van
- 出版商: Packt Publishing
- 出版日期: 2024-10-31
- 售價: $1,800
- 貴賓價: 9.5 折 $1,710
- 語言: 英文
- 頁數: 188
- 裝訂: Quality Paper - also called trade paper
- ISBN: 1805123750
- ISBN-13: 9781805123750
立即出貨 (庫存 < 3)
相關主題
商品描述
Confidently orchestrate your data pipelines with Apache Airflow by applying industry best practices and scalable strategies
Key Features:
- Understand the steps for migrating from Airflow 1.x to 2.x and explore the new features and improvements in version 2.x
- Learn Apache Airflow workflow authoring through real-world use cases
- Uncover strategies to operationalize your Airflow instance and pipelines for resilient operations and high throughput
- Purchase of the print or Kindle book includes a free PDF eBook
Book Description:
Data professionals face the monumental task of managing complex data pipelines, orchestrating workflows across diverse systems, and ensuring scalable, reliable data processing. This definitive guide to mastering Apache Airflow, written by experts in engineering, data strategy, and problem-solving across tech, financial, and life sciences industries, is your key to overcoming these challenges. It covers everything from the basics of Airflow and its core components to advanced topics such as custom plugin development, multi-tenancy, and cloud deployment.
Starting with an introduction to data orchestration and the significant updates in Apache Airflow 2.0, this book takes you through the essentials of DAG authoring, managing Airflow components, and connecting to external data sources. Through real-world use cases, you'll gain practical insights into implementing ETL pipelines and machine learning workflows in your environment. You'll also learn how to deploy Airflow in cloud environments, tackle operational considerations for scaling, and apply best practices for CI/CD and monitoring.
By the end of this book, you'll be proficient in operating and using Apache Airflow, authoring high-quality workflows in Python for your specific use cases, and making informed decisions crucial for production-ready implementation.
What You Will Learn:
- Explore the new features and improvements in Apache Airflow 2.0
- Design and build data pipelines using DAGs
- Implement ETL pipelines, ML workflows, and other advanced use cases
- Develop and deploy custom plugins and UI extensions
- Deploy and manage Apache Airflow in cloud environments such as AWS, GCP, and Azure
- Describe a path for the scaling of your environment over time
- Apply best practices for monitoring and maintaining Airflow
Who this book is for:
This book is for data engineers, developers, IT professionals, and data scientists who want to optimize workflow orchestration with Apache Airflow. It's perfect for those who recognize Airflow's potential and want to avoid common implementation pitfalls. Whether you're new to data, an experienced professional, or a manager seeking insights, this guide will support you. A functional understanding of Python, some business experience, and basic DevOps skills are helpful. While prior experience with Airflow is not required, it is beneficial.
Table of Contents
- Getting Started with Airflow 2.0
- Core Airflow Concepts
- Components of Airflow
- Basics of Airflow and DAG Authoring
- Connecting to External Sources
- Extending Functionality with UI Plugins
- Writing and Distributing Custom Providers
- Orchestrating a Machine Learning Workflow
- Using Airflow as a Driving Service
- Airflow Ops: Development and Deployment
- Airflow Ops Best Practices: Observation and Monitoring
- Multi-Tenancy in Airflow
- Migrating Airflow
商品描述(中文翻譯)
自信地使用 Apache Airflow 管理您的數據管道,應用行業最佳實踐和可擴展策略
主要特點:
- 了解從 Airflow 1.x 遷移到 2.x 的步驟,並探索 2.x 版本中的新功能和改進
- 通過實際案例學習 Apache Airflow 工作流程的編寫
- 發掘將您的 Airflow 實例和管道運營化的策略,以實現韌性操作和高吞吐量
- 購買印刷版或 Kindle 書籍可獲得免費 PDF 電子書
書籍描述:
數據專業人員面臨管理複雜數據管道的艱巨任務,協調跨多樣系統的工作流程,並確保可擴展、可靠的數據處理。本書是掌握 Apache Airflow 的權威指南,由來自技術、金融和生命科學行業的工程、數據策略和問題解決專家撰寫,是您克服這些挑戰的關鍵。內容涵蓋從 Airflow 的基本概念及其核心組件到自定義插件開發、多租戶和雲部署等高級主題。
本書從數據協調的介紹和 Apache Airflow 2.0 的重大更新開始,帶您了解 DAG 編寫的基本要素、管理 Airflow 組件以及連接外部數據源。通過實際案例,您將獲得在您的環境中實施 ETL 管道和機器學習工作流程的實用見解。您還將學習如何在雲環境中部署 Airflow,處理擴展的運營考量,並應用 CI/CD 和監控的最佳實踐。
在本書結束時,您將熟練操作和使用 Apache Airflow,為您的特定用例編寫高質量的 Python 工作流程,並做出對生產就緒實施至關重要的明智決策。
您將學到什麼:
- 探索 Apache Airflow 2.0 的新功能和改進
- 使用 DAG 設計和構建數據管道
- 實施 ETL 管道、機器學習工作流程和其他高級用例
- 開發和部署自定義插件和 UI 擴展
- 在 AWS、GCP 和 Azure 等雲環境中部署和管理 Apache Airflow
- 描述隨著時間推移擴展環境的路徑
- 應用 Airflow 的監控和維護最佳實踐
本書適合誰:
本書適合希望使用 Apache Airflow 優化工作流程協調的數據工程師、開發人員、IT 專業人員和數據科學家。對於那些認識到 Airflow 潛力並希望避免常見實施陷阱的人來說,這本書是完美的選擇。無論您是數據新手、經驗豐富的專業人士,還是尋求見解的經理,本指南都將支持您。具備 Python 的基本理解、一些商業經驗和基本的 DevOps 技能將會有所幫助。雖然不需要先前的 Airflow 經驗,但有這方面的經驗會更有利。
目錄
- 開始使用 Airflow 2.0
- Airflow 的核心概念
- Airflow 的組件
- Airflow 和 DAG 編寫的基礎
- 連接外部數據源
- 使用 UI 插件擴展功能
- 編寫和分發自定義提供者
- 協調機器學習工作流程
- 將 Airflow 作為驅動服務使用
- Airflow 操作:開發和部署
- Airflow 操作最佳實踐:觀察和監控
- Airflow 中的多租戶
- 遷移 Airflow