Learning Spark: Lightning-Fast Big Data Analysis (Paperback)
暫譯: 學習 Spark:閃電般快速的大數據分析(平裝本)

Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia

買這商品的人也買了...

相關主題

商品描述

The Web is getting faster, and the data it delivers is getting bigger. How can you handle everything efficiently? This book introduces Spark, an open source cluster computing system that makes data analytics fast to run and fast to write. You’ll learn how to run programs faster, using primitives for in-memory cluster computing. With Spark, your job can load data into memory and query it repeatedly much quicker than with disk-based systems like Hadoop MapReduce.

Written by the developers of Spark, this book will have you up and running in no time. You’ll learn how to express MapReduce jobs with just a few simple lines of Spark code, instead of spending extra time and effort working with Hadoop’s raw Java API.

  • Quickly dive into Spark capabilities such as collect, count, reduce, and save
  • Use one programming paradigm instead of mixing and matching tools such as Hive, Hadoop, Mahout, and S4/Storm
  • Learn how to run interactive, iterative, and incremental analyses
  • Integrate with Scala to manipulate distributed datasets like local collections
  • Tackle partitioning issues, data locality, default hash partitioning, user-defined partitioners, and custom serialization
  • Use other languages by means of pipe() to achieve the equivalent of Hadoop streaming

商品描述(中文翻譯)

網路正在變得更快,所傳遞的數據也越來越大。你如何能有效地處理所有這些?本書介紹了 Spark,一個開源的叢集計算系統,使數據分析的執行速度和編寫速度都變得更快。你將學會如何使用內存叢集計算的原語來更快地運行程序。使用 Spark,你的工作可以將數據加載到內存中並重複查詢,這比使用基於磁碟的系統如 Hadoop MapReduce 快得多。

本書由 Spark 的開發者撰寫,將讓你迅速上手。你將學會如何用幾行簡單的 Spark 代碼來表達 MapReduce 工作,而不是花額外的時間和精力來處理 Hadoop 的原始 Java API。

- 快速深入了解 Spark 的功能,如 collect、count、reduce 和 save
- 使用一種編程範式,而不是混合使用 Hive、Hadoop、Mahout 和 S4/Storm 等工具
- 學習如何運行互動式、迭代式和增量分析
- 與 Scala 整合,像操作本地集合一樣操作分佈式數據集
- 解決分區問題、數據本地性、默認哈希分區、用戶定義的分區器和自定義序列化
- 通過 pipe() 使用其他語言,以實現相當於 Hadoop streaming 的功能