Data Engineering with AWS Cookbook: A recipe-based approach to help you tackle data engineering problems with AWS services (AWS 數據工程食譜:以食譜為基礎的方法幫助您解決數據工程問題)

Phạm, Trâm Ngọc, González, Gonzalo Herreros, Khan, Viquar

  • 出版商: Packt Publishing
  • 出版日期: 2024-11-29
  • 售價: $2,040
  • 貴賓價: 9.5$1,938
  • 語言: 英文
  • 頁數: 528
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1805127284
  • ISBN-13: 9781805127284
  • 相關分類: Amazon Web Services
  • 無法訂購

相關主題

商品描述

Master AWS data engineering services and techniques for orchestrating pipelines, building layers, and managing migrations

Key Features:

- Get up to speed with the different AWS technologies for data engineering

- Learn the different aspects and considerations of building data lakes, such as security, storage, and operations

- Get hands on with key AWS services such as Glue, EMR, Redshift, QuickSight, and Athena for practical learning

- Purchase of the print or Kindle book includes a free PDF eBook

Book Description:

Performing data engineering with Amazon Web Services (AWS) combines AWS's scalable infrastructure with robust data processing tools, enabling efficient data pipelines and analytics workflows. This comprehensive guide to AWS data engineering will teach you all you need to know about data lake management, pipeline orchestration, and serving layer construction.

Through clear explanations and hands-on exercises, you'll master essential AWS services such as Glue, EMR, Redshift, QuickSight, and Athena. Additionally, you'll explore various data platform topics such as data governance, data quality, DevOps, CI/CD, planning and performing data migration, and creating Infrastructure as Code. As you progress, you will gain insights into how to enrich your platform and use various AWS cloud services such as AWS EventBridge, AWS DataZone, and AWS SCT and DMS to solve data platform challenges.

Each recipe in this book is tailored to a daily challenge that a data engineer team faces while building a cloud platform. By the end of this book, you will be well-versed in AWS data engineering and have gained proficiency in key AWS services and data processing techniques. You will develop the necessary skills to tackle large-scale data challenges with confidence.

What You Will Learn:

- Define your centralized data lake solution, and secure and operate it at scale

- Identify the most suitable AWS solution for your specific needs

- Build data pipelines using multiple ETL technologies

- Discover how to handle data orchestration and governance

- Explore how to build a high-performing data serving layer

- Delve into DevOps and data quality best practices

- Migrate your data from on-premises to AWS

Who this book is for:

If you're involved in designing, building, or overseeing data solutions on AWS, this book provides proven strategies for addressing challenges in large-scale data environments. Data engineers as well as big data professionals looking to enhance their understanding of AWS features for optimizing their workflow, even if they're new to the platform, will find value. Basic familiarity with AWS security (users and roles) and command shell is recommended.

Table of Contents

- Managing Data Lake Storage

- Sharing Your Data Across Environments and Accounts

- Ingesting and Transforming Your Data with AWS Glue

- A Deep Dive into AWS Orchestration Frameworks

- Running Big Data Workloads with Amazon EMR

- Governing Your Platform

- Data Quality Management

- DevOps - Defining IaC and Building CI/CD Pipelines

- Monitoring Data Lake Cloud Infrastructure

- Building a Serving Layer with AWS Analytics Services

- Migrating to AWS - Steps, Strategies, and Best Practices for Modernizing Your Analytics and Big Data Workloads

- Harnessing the Power of AWS for Seamless Data Warehouse Migration

- Strategizing Hadoop Migrations - Cost, Data, and Workflow Modernization with AWS

商品描述(中文翻譯)

掌握 AWS 數據工程服務和技術,以協調管道、構建層級和管理遷移

主要特點:
- 快速了解不同的 AWS 數據工程技術
- 學習構建數據湖的不同方面和考量,例如安全性、存儲和運營
- 實際操作關鍵的 AWS 服務,如 Glue、EMR、Redshift、QuickSight 和 Athena,以進行實踐學習
- 購買印刷版或 Kindle 書籍可獲得免費 PDF 電子書

書籍描述:
使用 Amazon Web Services (AWS) 進行數據工程結合了 AWS 的可擴展基礎設施和強大的數據處理工具,使得高效的數據管道和分析工作流程成為可能。本書是 AWS 數據工程的全面指南,將教會你有關數據湖管理、管道協調和服務層構建的所有知識。

通過清晰的解釋和實踐練習,你將掌握 Glue、EMR、Redshift、QuickSight 和 Athena 等基本 AWS 服務。此外,你還將探索各種數據平台主題,如數據治理、數據質量、DevOps、CI/CD、計劃和執行數據遷移,以及創建基礎設施即代碼(Infrastructure as Code)。隨著學習的深入,你將獲得如何豐富你的平台的見解,並使用 AWS EventBridge、AWS DataZone、AWS SCT 和 DMS 等各種 AWS 雲服務來解決數據平台挑戰。

本書中的每個食譜都針對數據工程團隊在構建雲平台時面臨的日常挑戰。到本書結束時,你將熟悉 AWS 數據工程,並在關鍵的 AWS 服務和數據處理技術上獲得熟練度。你將發展出應對大規模數據挑戰的必要技能,並充滿信心。

你將學到的內容:
- 定義你的集中式數據湖解決方案,並在規模上進行安全和運營
- 確定最適合你特定需求的 AWS 解決方案
- 使用多種 ETL 技術構建數據管道
- 探索如何處理數據協調和治理
- 探索如何構建高效能的數據服務層
- 深入了解 DevOps 和數據質量最佳實踐
- 將你的數據從本地遷移到 AWS

本書適合對象:
如果你參與設計、構建或監督 AWS 上的數據解決方案,本書提供了針對大規模數據環境挑戰的有效策略。數據工程師以及希望增強對 AWS 功能理解的大數據專業人士,即使他們對該平台不熟悉,也會發現本書的價值。建議具備基本的 AWS 安全性(用戶和角色)和命令行介面的熟悉度。

目錄:
- 管理數據湖存儲
- 在環境和帳戶之間共享數據
- 使用 AWS Glue 進行數據攝取和轉換
- 深入了解 AWS 協調框架
- 使用 Amazon EMR 運行大數據工作負載
- 管理你的平台
- 數據質量管理
- DevOps - 定義基礎設施即代碼(IaC)並構建 CI/CD 管道
- 監控數據湖雲基礎設施
- 使用 AWS 分析服務構建服務層
- 遷移到 AWS - 現代化你的分析和大數據工作負載的步驟、策略和最佳實踐
- 利用 AWS 的力量實現無縫數據倉庫遷移
- 策略性地進行 Hadoop 遷移 - 成本、數據和工作流程現代化與 AWS