Big Data Processing with Apache Spark: Efficiently tackle large datasets and big data analysis with Spark and Python
暫譯: 使用 Apache Spark 進行大數據處理：高效處理大型數據集與大數據分析，搭配 Spark 和 Python

Name: Big Data Processing with Apache Spark: Efficiently tackle large datasets and big data analysis with Spark and Python
Price: 1482 TWD
Availability: OnlineOnly
Author: Manuel Ignacio Franco Galeano
ISBN: 1789808812

Manuel Ignacio Franco Galeano

出版商: Packt Publishing
出版日期: 2018-10-31
售價: $1,560
貴賓價: 9.5 折 $1,482
語言: 英文
頁數: 142
裝訂: Paperback
ISBN: 1789808812
ISBN-13: 9781789808810
相關分類: Spark

海外代購書籍(需單獨結帳)

商品描述

No need to spend hours ploughing through endless data – let Spark, one of the fastest big data processing engines available, do the hard work for you.

Key Features

Get up and running with Apache Spark and Python
Integrate Spark with AWS for real-time analytics
Apply processed data streams to machine learning APIs of Apache Spark

Book Description

Processing big data in real time is challenging due to scalability, information consistency, and fault-tolerance. This book teaches you how to use Spark to make your overall analytical workflow faster and more efficient. You'll explore all core concepts and tools within the Spark ecosystem, such as Spark Streaming, the Spark Streaming API, machine learning extension, and structured streaming.

You'll begin by learning data processing fundamentals using Resilient Distributed Datasets (RDDs), SQL, Datasets, and Dataframes APIs. After grasping these fundamentals, you'll move on to using Spark Streaming APIs to consume data in real time from TCP sockets, and integrate Amazon Web Services (AWS) for stream consumption.

By the end of this book, you'll not only have understood how to use machine learning extensions and structured streams but you'll also be able to apply Spark in your own upcoming big data projects.

What you will learn

Write your own Python programs that can interact with Spark
Implement data stream consumption using Apache Spark
Recognize common operations in Spark to process known data streams
Integrate Spark streaming with Amazon Web Services (AWS)
Create a collaborative filtering model with the movielens dataset
Apply processed data streams to Spark machine learning APIs

Who this book is for

Data Processing with Apache Spark is for you if you are a software engineer, architect, or IT professional who wants to explore distributed systems and big data analytics. Although you don�t need any knowledge of Spark, prior experience of working with Python is recommended.

Introduction to Spark Distributed Processing
Introduction to Spark Streaming
Spark Streaming Integration with AWS
Spark Streaming, ML, and Windowing Operations

商品描述(中文翻譯)

不需要花費數小時翻閱無盡的數據 – 讓 Spark，這個最快的大數據處理引擎之一，為您完成艱難的工作。

主要特點
- 使用 Apache Spark 和 Python 快速上手
- 將 Spark 與 AWS 整合以進行實時分析
- 將處理過的數據流應用於 Apache Spark 的機器學習 API

書籍描述
實時處理大數據具有挑戰性，因為需要考慮可擴展性、信息一致性和容錯性。本書教您如何使用 Spark 使整體分析工作流程更快、更高效。您將探索 Spark 生態系統中的所有核心概念和工具，例如 Spark Streaming、Spark Streaming API、機器學習擴展和結構化流。

您將首先學習使用彈性分佈式數據集（Resilient Distributed Datasets, RDDs）、SQL、Datasets 和 Dataframes API 的數據處理基礎知識。在掌握這些基礎後，您將進一步使用 Spark Streaming API 從 TCP 套接字實時消費數據，並整合 Amazon Web Services (AWS) 以進行流消費。

在本書結束時，您不僅會了解如何使用機器學習擴展和結構化流，還能夠在即將到來的大數據項目中應用 Spark。

您將學到的內容
- 編寫可以與 Spark 互動的 Python 程式
- 使用 Apache Spark 實現數據流消費
- 辨識 Spark 中處理已知數據流的常見操作
- 將 Spark Streaming 與 Amazon Web Services (AWS) 整合
- 使用 movielens 數據集創建協同過濾模型
- 將處理過的數據流應用於 Spark 機器學習 API

本書適合誰
《使用 Apache Spark 進行數據處理》適合您，如果您是希望探索分佈式系統和大數據分析的軟體工程師、架構師或 IT 專業人員。雖然您不需要具備 Spark 的任何知識，但建議您具備使用 Python 的經驗。

目錄
1. Spark 分佈式處理簡介
2. Spark Streaming 簡介
3. Spark Streaming 與 AWS 的整合
4. Spark Streaming、機器學習和窗口操作