Apache Spark 2 for Beginners (Paperback )
暫譯: Apache Spark 2 入門指南 (平裝本)

Rajanarayanan Thottuvaikkatumana

  • 出版商: Packt Publishing
  • 出版日期: 2016-09-30
  • 售價: $1,710
  • 貴賓價: 9.5$1,625
  • 語言: 英文
  • 頁數: 332
  • 裝訂: Paperback
  • ISBN: 1785885006
  • ISBN-13: 9781785885006
  • 相關分類: Spark
  • 海外代購書籍(需單獨結帳)

商品描述

Key Features

  • This book offers an easy introduction to the Spark framework published on the latest version of Apache Spark 2
  • Perform efficient data processing, machine learning and graph processing using various Spark components
  • A practical guide aimed at beginners to get them up and running with Spark

Book Description

Spark is one of the most widely-used large-scale data processing engines and runs extremely fast. It is a framework that has tools that are equally useful for application developers as well as data scientists.

This book starts with the fundamentals of Spark 2 and covers the core data processing framework and API, installation, and application development setup. Then the Spark programming model is introduced through real-world examples followed by Spark SQL programming with DataFrames. An introduction to SparkR is covered next. Later, we cover the charting and plotting features of Python in conjunction with Spark data processing. After that, we take a look at Spark's stream processing, machine learning, and graph processing libraries. The last chapter combines all the skills you learned from the preceding chapters to develop a real-world Spark application.

By the end of this book, you will have all the knowledge you need to develop efficient large-scale applications using Apache Spark.

What you will learn

  • Get to know the fundamentals of Spark 2 and the Spark programming model using Scala and Python
  • Know how to use Spark SQL and DataFrames using Scala and Python
  • Get an introduction to Spark programming using R
  • Perform Spark data processing, charting, and plotting using Python
  • Get acquainted with Spark stream processing using Scala and Python
  • Be introduced to machine learning using Spark MLlib
  • Get started with graph processing using the Spark GraphX
  • Bring together all that you've learned and develop a complete Spark application

About the Author

Rajanarayanan Thottuvaikkatumana, Raj, is a seasoned technologist with more than 23 years of software development experience at various multinational companies. He has lived and worked in India, Singapore, and the USA, and is presently based out of the UK. His experience includes architecting, designing, and developing software applications. He has worked on various technologies including major databases, application development platforms, web technologies, and big data technologies. Since 2000, he has been working mainly in Java related technologies, and does heavy-duty server-side programming in Java and Scala. He has worked on very highly concurrent, highly distributed, and high transaction volume systems. Currently he is building a next generation Hadoop YARN-based data processing platform and an application suite built with Spark using Scala.

Raj holds one master's degree in Mathematics, one master's degree in Computer Information Systems and has many certifications in ITIL and cloud computing to his credit. Raj is the author of Cassandra Design Patterns - Second Edition, published by Packt.

When not working on the assignments his day job demands, Raj is an avid listener to classical music and watches a lot of tennis.

Table of Contents

  1. Spark Fundamentals
  2. Spark Programming Model
  3. Spark SQL
  4. Spark Programming with R
  5. Spark Data Analysis with Python
  6. Spark Stream Processing
  7. Spark Machine Learning
  8. Spark Graph Processing
  9. Designing Spark Applications

商品描述(中文翻譯)

#### 主要特點
- 本書提供了對最新版本 Apache Spark 2 的 Spark 框架的簡易介紹
- 使用各種 Spark 組件執行高效的數據處理、機器學習和圖形處理
- 一本針對初學者的實用指南,幫助他們快速上手 Spark

#### 書籍描述
Spark 是最廣泛使用的大規模數據處理引擎之一,運行速度極快。它是一個框架,擁有對應用開發者和數據科學家同樣有用的工具。

本書從 Spark 2 的基本概念開始,涵蓋核心數據處理框架和 API、安裝以及應用開發設置。接著通過實際案例介紹 Spark 編程模型,然後進入使用 DataFrames 的 Spark SQL 編程。接下來介紹 SparkR。之後,我們將探討 Python 在 Spark 數據處理中的圖表和繪圖功能。接著,我們將了解 Spark 的流處理、機器學習和圖形處理庫。最後一章將結合前面章節所學的所有技能,開發一個實際的 Spark 應用程序。

在本書結束時,您將擁有開發高效大規模應用程序所需的所有知識,使用 Apache Spark。

#### 您將學到什麼
- 了解 Spark 2 的基本概念及使用 Scala 和 Python 的 Spark 編程模型
- 知道如何使用 Scala 和 Python 使用 Spark SQL 和 DataFrames
- 獲得使用 R 進行 Spark 編程的介紹
- 使用 Python 執行 Spark 數據處理、圖表和繪圖
- 熟悉使用 Scala 和 Python 進行 Spark 流處理
- 了解使用 Spark MLlib 的機器學習
- 開始使用 Spark GraphX 進行圖形處理
- 將您所學的所有知識整合,開發一個完整的 Spark 應用程序

#### 關於作者
**Rajanarayanan Thottuvaikkatumana**,Raj,是一位經驗豐富的技術專家,擁有超過 23 年在多家跨國公司從事軟體開發的經驗。他曾在印度、新加坡和美國生活和工作,目前居住在英國。他的經驗包括架構設計和開發軟體應用程序。他曾從事多種技術,包括主要數據庫、應用開發平台、網頁技術和大數據技術。自 2000 年以來,他主要從事 Java 相關技術的工作,並在 Java 和 Scala 中進行重型伺服器端編程。他曾參與非常高併發、高分佈和高交易量的系統。目前,他正在構建一個基於 Hadoop YARN 的下一代數據處理平台和一個使用 Scala 構建的 Spark 應用套件。

Raj 擁有數學碩士學位和計算機信息系統碩士學位,並擁有多項 ITIL 和雲計算的認證。Raj 是《Cassandra 設計模式 - 第二版》的作者,該書由 Packt 出版。

在不忙於日常工作的時候,Raj 是一位古典音樂的熱愛者,並且喜歡觀看網球比賽。

#### 目錄
1. Spark 基礎
2. Spark 編程模型
3. Spark SQL
4. 使用 R 進行 Spark 編程
5. 使用 Python 進行 Spark 數據分析
6. Spark 流處理
7. Spark 機器學習
8. Spark 圖形處理
9. 設計 Spark 應用程序