Apache Spark Quick Start Guide: Quickly learn the art of writing efficient big data applications with Apache Spark
暫譯: Apache Spark 快速入門指南:快速學習使用 Apache Spark 編寫高效大數據應用程式的技巧

Shrey Mehrotra, Akash Grade

  • 出版商: Packt Publishing
  • 出版日期: 2019-01-31
  • 售價: $1,470
  • 貴賓價: 9.5$1,397
  • 語言: 英文
  • 頁數: 154
  • 裝訂: Paperback
  • ISBN: 1789349109
  • ISBN-13: 9781789349108
  • 相關分類: Spark大數據 Big-data
  • 海外代購書籍(需單獨結帳)

商品描述

A practical guide for solving complex data processing challenges by applying the best optimizations techniques in Apache Spark.

Key Features

  • Learn about the core concepts and the latest developments in Apache Spark
  • Master writing efficient big data applications with Spark's built-in modules for SQL, Streaming, Machine Learning and Graph analysis
  • Get introduced to a variety of optimizations based on the actual experience

Book Description

Apache Spark is a ?exible framework that allows processing of batch and real-time data. Its unified engine has made it quite popular for big data use cases. This book will help you to get started with Apache Spark 2.0 and write big data applications for a variety of use cases.

It will also introduce you to Apache Spark – one of the most popular Big Data processing frameworks. Although this book is intended to help you get started with Apache Spark, but it also focuses on explaining the core concepts.

This practical guide provides a quick start to the Spark 2.0 architecture and its components. It teaches you how to set up Spark on your local machine. As we move ahead, you will be introduced to resilient distributed datasets (RDDs) and DataFrame APIs, and their corresponding transformations and actions. Then, we move on to the life cycle of a Spark application and learn about the techniques used to debug slow-running applications. You will also go through Spark's built-in modules for SQL, streaming, machine learning, and graph analysis.

Finally, the book will lay out the best practices and optimization techniques that are key for writing efficient Spark applications. By the end of this book, you will have a sound fundamental understanding of the Apache Spark framework and you will be able to write and optimize Spark applications.

What you will learn

  • Learn core concepts such as RDDs, DataFrames, transformations, and more
  • Set up a Spark development environment
  • Choose the right APIs for your applications
  • Understand Spark's architecture and the execution ?ow of a Spark application
  • Explore built-in modules for SQL, streaming, ML, and graph analysis
  • Optimize your Spark job for better performance

Who this book is for

If you are a big data enthusiast and love processing huge amount of data, this book is for you. If you are data engineer and looking for the best optimization techniques for your Spark applications, then you will find this book helpful. This book also helps data scientists who want to implement their machine learning algorithms in Spark. You need to have a basic understanding of any one of the programming languages such as Scala, Python or Java.

Table of Contents

  1. Introduction to Apache Spark
  2. Apache Spark Installation
  3. Spark RDD
  4. Spark DataFrame and Dataset
  5. Spark Architecture and Application Execution Flow
  6. Spark SQL
  7. Spark Streaming, Machine Learning, and Graph Analysis
  8. Spark Optimizations

商品描述(中文翻譯)

實用指南:透過應用最佳優化技術於 Apache Spark 來解決複雜的數據處理挑戰。

主要特點



  • 了解 Apache Spark 的核心概念和最新發展

  • 掌握使用 Spark 的內建模組撰寫高效的大數據應用程式,包括 SQL、Streaming、機器學習和圖形分析

  • 根據實際經驗介紹各種優化技術

書籍描述


Apache Spark 是一個靈活的框架,允許批次和即時數據處理。其統一的引擎使其在大數據應用案例中相當受歡迎。本書將幫助您開始使用 Apache Spark 2.0,並為各種使用案例撰寫大數據應用程式。


本書還將介紹 Apache Spark——最受歡迎的大數據處理框架之一。雖然本書旨在幫助您入門 Apache Spark,但也專注於解釋核心概念。


這本實用指南提供了 Spark 2.0 架構及其組件的快速入門。它教您如何在本地機器上設置 Spark。隨著內容的深入,您將接觸到彈性分佈式數據集(RDDs)和 DataFrame API,以及它們相應的轉換和操作。然後,我們將進入 Spark 應用程式的生命週期,了解用於調試執行緩慢的應用程式的技術。您還將學習 Spark 的內建模組,包括 SQL、Streaming、機器學習和圖形分析。


最後,本書將列出撰寫高效 Spark 應用程式的最佳實踐和優化技術。在本書結束時,您將對 Apache Spark 框架有扎實的基本理解,並能夠撰寫和優化 Spark 應用程式。

您將學到什麼



  • 學習核心概念,如 RDDs、DataFrames、轉換等

  • 設置 Spark 開發環境

  • 為您的應用程式選擇合適的 API

  • 了解 Spark 的架構和 Spark 應用程式的執行流程

  • 探索 SQL、Streaming、機器學習和圖形分析的內建模組

  • 優化您的 Spark 作業以獲得更好的性能

本書適合誰


如果您是大數據愛好者,並喜歡處理大量數據,那麼這本書適合您。如果您是數據工程師,並尋找最佳的優化技術以提升您的 Spark 應用程式,那麼您會發現這本書非常有幫助。本書也幫助希望在 Spark 中實現其機器學習算法的數據科學家。您需要對 Scala、Python 或 Java 等任何一種程式語言有基本的了解。

目錄



  1. Apache Spark 介紹

  2. Apache Spark 安裝

  3. Spark RDD

  4. Spark DataFrame 和 Dataset

  5. Spark 架構和應用程式執行流程

  6. Spark SQL

  7. Spark Streaming、機器學習和圖形分析

  8. Spark 優化