Big Data Processing Using Spark in Cloud (Studies in Big Data)
暫譯: 雲端中的大數據處理：使用 Spark（大數據研究系列）

出版商: Springer
出版日期: 2018-06-26
售價: $4,450
貴賓價: 9.5 折 $4,228
語言: 英文
頁數: 264
裝訂: Hardcover
ISBN: 9811305498
ISBN-13: 9789811305498
相關分類: Spark、大數據 Big-data

海外代購書籍(需單獨結帳)

商品描述

The book describes the emergence of big data technologies and the role of Spark in the entire big data stack. It compares Spark and Hadoop and identifies the shortcomings of Hadoop that have been overcome by Spark. The book mainly focuses on the in-depth architecture of Spark and our understanding of Spark RDDs and how RDD complements big data’s immutable nature, and solves it with lazy evaluation, cacheable and type inference. It also addresses advanced topics in Spark, starting with the basics of Scala and the core Spark framework, and exploring Spark data frames, machine learning using Mllib, graph analytics using Graph X and real-time processing with Apache Kafka, AWS Kenisis, and Azure Event Hub. It then goes on to investigate Spark using PySpark and R. Focusing on the current big data stack, the book examines the interaction with current big data tools, with Spark being the core processing layer for all types of data.

The book is intended for data engineers and scientists working on massive datasets and big data technologies in the cloud. In addition to industry professionals, it is helpful for aspiring data processing professionals and students working in big data processing and cloud computing environments.

商品描述(中文翻譯)

本書描述了大數據技術的出現以及 Spark 在整個大數據堆疊中的角色。它比較了 Spark 和 Hadoop，並指出了 Hadoop 的不足之處，而這些不足之處已被 Spark 克服。本書主要集中於 Spark 的深入架構，以及我們對 Spark RDD 的理解，並探討 RDD 如何補充大數據的不可變性，並通過延遲評估、可快取和類型推斷來解決這一問題。它還涉及 Spark 的進階主題，從 Scala 的基礎知識和核心 Spark 框架開始，探索 Spark 數據框、使用 Mllib 的機器學習、使用 Graph X 的圖形分析，以及使用 Apache Kafka、AWS Kinesis 和 Azure Event Hub 的實時處理。接著，它進一步研究了使用 PySpark 和 R 的 Spark。專注於當前的大數據堆疊，本書檢視了與當前大數據工具的互動，其中 Spark 是所有類型數據的核心處理層。

本書適合在雲端處理大量數據集和大數據技術的數據工程師和科學家。除了行業專業人士外，對於有志於數據處理的專業人士和在大數據處理及雲計算環境中工作的學生也非常有幫助。