Fast Data Processing with Spark, 2/e(Paperback)
暫譯: 快速數據處理與 Spark, 第2版(平裝本)
Krishna Sankar, Holden Karau
- 出版商: Packt Publishing
- 出版日期: 2015-03-31
- 售價: $1,360
- 貴賓價: 9.5 折 $1,292
- 語言: 英文
- 頁數: 184
- 裝訂: Paperback
- ISBN: 178439257X
- ISBN-13: 9781784392574
-
相關分類:
Spark
海外代購書籍(需單獨結帳)
買這商品的人也買了...
-
$2,220$2,109 -
$1,980$1,881 -
$1,840$1,748 -
$1,900$1,805
商品描述
Perform real-time analytics using Spark in a fast, distributed, and scalable way
About This Book
- Develop a machine learning system with Spark's MLlib and scalable algorithms
- Deploy Spark jobs to various clusters such as Mesos, EC2, Chef, YARN, EMR, and so on
- This is a step-by-step tutorial that unleashes the power of Spark and its latest features
Who This Book Is For
Fast Data Processing with Spark - Second Edition is for software developers who want to learn how to write distributed programs with Spark. It will help developers who have had problems that were too big to be dealt with on a single computer. No previous experience with distributed programming is necessary. This book assumes knowledge of either Java, Scala, or Python.
What You Will Learn
- Install and set up Spark on your cluster
- Prototype distributed applications with Spark's interactive shell
- Learn different ways to interact with Spark's distributed representation of data (RDDs)
- Query Spark with a SQL-like query syntax
- Effectively test your distributed software
- Recognize how Spark works with big data
- Implement machine learning systems with highly scalable algorithms
In Detail
Spark is a framework used for writing fast, distributed programs. Spark solves similar problems as Hadoop MapReduce does, but with a fast in-memory approach and a clean functional style API. With its ability to integrate with Hadoop and built-in tools for interactive query analysis (Spark SQL), large-scale graph processing and analysis (GraphX), and real-time analysis (Spark Streaming), it can be interactively used to quickly process and query big datasets.
Fast Data Processing with Spark - Second Edition covers how to write distributed programs with Spark. The book will guide you through every step required to write effective distributed programs from setting up your cluster and interactively exploring the API to developing analytics applications and tuning them for your purposes.
商品描述(中文翻譯)
使用 Spark 以快速、分散且可擴展的方式執行即時分析
本書簡介
- 使用 Spark 的 MLlib 和可擴展算法開發機器學習系統
- 將 Spark 作業部署到各種叢集,如 Mesos、EC2、Chef、YARN、EMR 等
- 這是一本逐步教學,釋放 Spark 的力量及其最新功能
本書適合誰閱讀
《使用 Spark 進行快速數據處理 - 第二版》適合希望學習如何使用 Spark 編寫分散式程式的軟體開發人員。它將幫助那些面對無法在單一電腦上處理的龐大問題的開發人員。無需具備分散式程式設計的先前經驗。本書假設讀者具備 Java、Scala 或 Python 的知識。
您將學到什麼
- 在您的叢集上安裝和設置 Spark
- 使用 Spark 的互動式外殼原型分散式應用程式
- 學習與 Spark 的分散式數據表示(RDDs)互動的不同方式
- 使用類似 SQL 的查詢語法查詢 Spark
- 有效測試您的分散式軟體
- 了解 Spark 如何處理大數據
- 使用高度可擴展的算法實現機器學習系統
詳細內容
Spark 是一個用於編寫快速、分散式程式的框架。Spark 解決的問題與 Hadoop MapReduce 類似,但採用快速的內存處理方法和乾淨的函數式風格 API。它能夠與 Hadoop 整合,並具備用於互動查詢分析(Spark SQL)、大規模圖形處理和分析(GraphX)以及即時分析(Spark Streaming)的內建工具,可以互動式地快速處理和查詢大型數據集。
《使用 Spark 進行快速數據處理 - 第二版》涵蓋了如何使用 Spark 編寫分散式程式。本書將指導您完成編寫有效分散式程式所需的每一步,從設置叢集和互動式探索 API,到開發分析應用程式並根據您的需求進行調整。