Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
暫譯: 使用 Scala 和 Spark 的資料工程：建立處理大量資料的串流和批次管道

Name: Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Price: 1786 TWD
Availability: OnlineOnly
Author: Tome, Eric, Bhattacharjee, Rupam, Radford, David
ISBN: 1804612588

Tome, Eric, Bhattacharjee, Rupam, Radford, David

出版商: Packt Publishing
出版日期: 2024-01-31
售價: $1,880
貴賓價: 9.5 折 $1,786
語言: 英文
頁數: 300
裝訂: Quality Paper - also called trade paper
ISBN: 1804612588
ISBN-13: 9781804612583
相關分類: JVM 語言、Spark

海外代購書籍(需單獨結帳)

商品描述

Take your data engineering skills to the next level by learning how to utilize Scala and functional programming to create continuous and scheduled pipelines that ingest, transform, and aggregate data

Key Features:

Transform data into a clean and trusted source of information for your organization using Scala
Build streaming and batch-processing pipelines with step-by-step explanations
Implement and orchestrate your pipelines by following CI/CD best practices and test-driven development (TDD)
Purchase of the print or Kindle book includes a free PDF eBook

Book Description:

Most data engineers know that performance issues in a distributed computing environment can easily lead to issues impacting the overall efficiency and effectiveness of data engineering tasks. While Python remains a popular choice for data engineering due to its ease of use, Scala shines in scenarios where the performance of distributed data processing is paramount.

This book will teach you how to leverage the Scala programming language on the Spark framework and use the latest cloud technologies to build continuous and triggered data pipelines. You'll do this by setting up a data engineering environment for local development and scalable distributed cloud deployments using data engineering best practices, test-driven development, and CI/CD. You'll also get to grips with DataFrame API, Dataset API, and Spark SQL API and its use. Data profiling and quality in Scala will also be covered, alongside techniques for orchestrating and performance tuning your end-to-end pipelines to deliver data to your end users.

By the end of this book, you will be able to build streaming and batch data pipelines using Scala while following software engineering best practices.

What You Will Learn:

Set up your development environment to build pipelines in Scala
Get to grips with polymorphic functions, type parameterization, and Scala implicits
Use Spark DataFrames, Datasets, and Spark SQL with Scala
Read and write data to object stores
Profile and clean your data using Deequ
Performance tune your data pipelines using Scala

Who this book is for:

This book is for data engineers who have experience in working with data and want to understand how to transform raw data into a clean, trusted, and valuable source of information for their organization using Scala and the latest cloud technologies.

商品描述(中文翻譯)

透過學習如何利用 Scala 和函數式編程來創建持續和定時的數據管道，以攝取、轉換和聚合數據，將您的數據工程技能提升到下一個層次

主要特點：

使用 Scala 將數據轉換為您組織的乾淨且可信的資訊來源

逐步解釋構建流式和批處理管道

遵循 CI/CD 最佳實踐和測試驅動開發 (TDD) 來實施和協調您的管道

購買印刷版或 Kindle 書籍包括免費 PDF 電子書

書籍描述：

大多數數據工程師都知道，在分散式計算環境中的性能問題很容易導致影響數據工程任務整體效率和有效性的問題。雖然 Python 由於其易用性仍然是數據工程的熱門選擇，但在分散式數據處理性能至關重要的情況下，Scala 表現出色。

本書將教您如何在 Spark 框架上利用 Scala 編程語言，並使用最新的雲技術來構建持續和觸發的數據管道。您將通過設置本地開發的數據工程環境和可擴展的分散式雲部署，使用數據工程最佳實踐、測試驅動開發和 CI/CD 來實現這一目標。您還將掌握 DataFrame API、Dataset API 和 Spark SQL API 及其用法。書中還將涵蓋 Scala 中的數據分析和質量，以及協調和性能調優端到端管道的技術，以將數據交付給最終用戶。

在本書結束時，您將能夠使用 Scala 構建流式和批處理數據管道，同時遵循軟體工程最佳實踐。

您將學到什麼：