Data Engineering with AWS: Learn how to design and build cloud-based data transformation pipelines using AWS (Paperback)
暫譯: AWS 數據工程:學習如何使用 AWS 設計和構建基於雲的數據轉換管道 (平裝本)

Gareth Eagar

買這商品的人也買了...

相關主題

商品描述

Key Features

  • Learn about common data architectures and modern approaches to generating value from big data
  • Explore AWS tools for ingesting, transforming, and consuming data, and for orchestrating pipelines
  • Learn how to architect and implement data lakes and data lakehouses for big data analytics

Book Description

Knowing how to architect and implement complex data pipelines is a highly sought-after skill. Data engineers are responsible for building these pipelines that ingest, transform, and join raw datasets - creating new value from the data in the process.

Amazon Web Services (AWS) offers a range of tools to simplify a data engineer's job, making it the preferred platform for performing data engineering tasks.

This book will take you through the services and the skills you need to architect and implement data pipelines on AWS. You'll begin by reviewing important data engineering concepts and some of the core AWS services that form a part of the data engineer's toolkit. You'll then architect a data pipeline, review raw data sources, transform the data, and learn how the transformed data is used by various data consumers. The book also teaches you about populating data marts and data warehouses along with how a data lakehouse fits into the picture. Later, you'll be introduced to AWS tools for analyzing data, including those for ad-hoc SQL queries and creating visualizations. In the final chapters, you'll understand how the power of machine learning and artificial intelligence can be used to draw new insights from data.

By the end of this AWS book, you'll be able to carry out data engineering tasks and implement a data pipeline on AWS independently.

What you will learn

  • Understand data engineering concepts and emerging technologies
  • Ingest streaming data with Amazon Kinesis Data Firehose
  • Optimize, denormalize, and join datasets with AWS Glue Studio
  • Use Amazon S3 events to trigger a Lambda process to transform a file
  • Run complex SQL queries on data lake data using Amazon Athena
  • Load data into a Redshift data warehouse and run queries
  • Create a visualization of your data using Amazon QuickSight
  • Extract sentiment data from a dataset using Amazon Comprehend

Who this book is for

This book is for data engineers, data analysts, and data architects who are new to AWS and looking to extend their skills to the AWS cloud. Anyone who is new to data engineering and wants to learn about the foundational concepts while gaining practical experience with common data engineering services on AWS will also find this book useful.

A basic understanding of big data-related topics and Python coding will help you get the most out of this book but is not needed. Familiarity with the AWS console and core services is also useful but not necessary.

商品描述(中文翻譯)

#### 主要特點

- 了解常見的數據架構及從大數據中創造價值的現代方法
- 探索 AWS 工具以進行數據的攝取、轉換和消費,以及協調數據管道
- 學習如何架構和實施數據湖和數據湖屋以進行大數據分析

#### 書籍描述

了解如何架構和實施複雜的數據管道是一項非常受歡迎的技能。數據工程師負責構建這些管道,這些管道攝取、轉換和聯接原始數據集,從中創造新的數據價值。

Amazon Web Services (AWS) 提供一系列工具來簡化數據工程師的工作,使其成為執行數據工程任務的首選平台。

本書將帶您了解在 AWS 上架構和實施數據管道所需的服務和技能。您將首先回顧重要的數據工程概念以及構成數據工程師工具包的一些核心 AWS 服務。接著,您將架構一個數據管道,檢視原始數據來源,轉換數據,並了解轉換後的數據如何被各種數據消費者使用。本書還教您如何填充數據集市和數據倉庫,以及數據湖屋在整體架構中的角色。稍後,您將接觸到 AWS 用於分析數據的工具,包括用於即席 SQL 查詢和創建可視化的工具。在最後幾章中,您將了解如何利用機器學習和人工智慧的力量從數據中提取新的見解。

在本書結束時,您將能夠獨立執行數據工程任務並在 AWS 上實施數據管道。

#### 您將學到的內容

- 理解數據工程概念和新興技術
- 使用 Amazon Kinesis Data Firehose 攝取串流數據
- 使用 AWS Glue Studio 優化、非正規化和聯接數據集
- 使用 Amazon S3 事件觸發 Lambda 進程以轉換文件
- 使用 Amazon Athena 在數據湖數據上運行複雜的 SQL 查詢
- 將數據加載到 Redshift 數據倉庫並運行查詢
- 使用 Amazon QuickSight 創建數據的可視化
- 使用 Amazon Comprehend 從數據集中提取情感數據

#### 本書適合誰

本書適合新接觸 AWS 的數據工程師、數據分析師和數據架構師,並希望將其技能擴展到 AWS 雲端。任何對數據工程感興趣並希望了解基礎概念,同時獲得使用 AWS 上常見數據工程服務的實踐經驗的人,也會發現本書非常有用。

對大數據相關主題和 Python 編碼的基本理解將幫助您充分利用本書,但並非必需。熟悉 AWS 控制台和核心服務也會有幫助,但不是必要條件。

作者簡介

Gareth Eagar has worked in the IT industry for over 25 years, starting in South Africa, then working in the United Kingdom, and now based in the United States. In 2017, he started working at Amazon Web Services (AWS) as a solution architect, working with enterprise customers in the NYC metro area. Gareth has become a recognized subject matter expert for building data lakes on AWS, and in 2019 he launched the Data Lake Day educational event at the AWS Lofts in NYC and San Francisco. He has also delivered a number of public talks and webinars on topics relating to big data, and in 2020 Gareth transitioned to the AWS Professional Services organization as a senior data architect, helping customers architect and build complex data pipelines.

作者簡介(中文翻譯)

Gareth Eagar 在資訊科技產業工作超過 25 年,最初在南非工作,之後在英國工作,現在則定居於美國。2017 年,他開始在 Amazon Web Services (AWS) 擔任解決方案架構師,與紐約市大都會區的企業客戶合作。Gareth 已成為在 AWS 上建立資料湖的公認專家,並於 2019 年在紐約市和舊金山的 AWS Lofts 發起了資料湖日(Data Lake Day)教育活動。他還就大數據相關主題發表了多場公開演講和網路研討會。2020 年,Gareth 轉任 AWS 專業服務組織的高級資料架構師,協助客戶設計和建構複雜的資料管道。

目錄大綱

Table of Contents

  1. An Introduction to Data Engineering
  2. Data Management Architectures for Analytics
  3. The AWS Data Engineer's Toolkit
  4. Data Cataloging, Security and Governance
  5. Architecting Data Engineering Pipelines
  6. Ingesting Batch and Streaming Data
  7. Transforming Data to Optimize for Analytics
  8. Identifying and Enabling Data Consumers
  9. Loading Data into a Data Mart
  10. Orchestrating the Data Pipeline
  11. Ad Hoc Queries with Amazon Athena
  12. Visualizing Data with Amazon QuickSight
  13. Enabling Artificial Intelligence and Machine Learning
  14. Wrapping Up the First Part of Your Learning Journey

目錄大綱(中文翻譯)

Table of Contents


  1. An Introduction to Data Engineering

  2. Data Management Architectures for Analytics

  3. The AWS Data Engineer's Toolkit

  4. Data Cataloging, Security and Governance

  5. Architecting Data Engineering Pipelines

  6. Ingesting Batch and Streaming Data

  7. Transforming Data to Optimize for Analytics

  8. Identifying and Enabling Data Consumers

  9. Loading Data into a Data Mart

  10. Orchestrating the Data Pipeline

  11. Ad Hoc Queries with Amazon Athena

  12. Visualizing Data with Amazon QuickSight

  13. Enabling Artificial Intelligence and Machine Learning

  14. Wrapping Up the First Part of Your Learning Journey