Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning library (Apache Spark 2 入門:具彈性分散式資料集、Spark SQL、結構化串流與Spark機器學習庫)

Hien Luu

  • 出版商: Apress
  • 出版日期: 2018-08-17
  • 定價: $1,500
  • 售價: 8.0$1,200
  • 語言: 英文
  • 頁數: 408
  • 裝訂: Paperback
  • ISBN: 1484235789
  • ISBN-13: 9781484235782
  • 相關分類: SparkSQLMachine Learning
  • 立即出貨 (庫存 < 3)

相關主題

商品描述

Develop applications for the big data landscape with Spark and Hadoop. This book also explains the role of Spark in developing scalable machine learning and analytics applications with Cloud technologies. Beginning Apache Spark 2 gives you an introduction to Apache Spark and shows you how to work with it.
 
Along the way, you’ll discover resilient distributed datasets (RDDs); use Spark SQL for structured data; and learn stream processing and build real-time applications with Spark Structured Streaming. Furthermore, you’ll learn the fundamentals of Spark ML for machine learning and much more. 
 
After you read this book, you will have the fundamentals to become proficient in using Apache Spark and know when and how to apply it to your big data applications.  
 
 
What You Will Learn  
  • Understand Spark unified data processing platform
  • How to run Spark in Spark Shell or Databricks 
  • Use and manipulate RDDs 
  • Deal with structured data using Spark SQL through its operations and advanced functions
  • Build real-time applications using Spark Structured Streaming
  • Develop intelligent applications with the Spark Machine Learning library
 
Who This Book Is For
 
Programmers and developers active in big data, Hadoop, and Java but who are new to the Apache Spark platform.  
 

商品描述(中文翻譯)

使用Spark和Hadoop開發大數據應用程式。本書還解釋了Spark在使用雲技術開發可擴展的機器學習和分析應用程式中的角色。《Beginning Apache Spark 2》介紹了Apache Spark並展示了如何使用它。

在學習的過程中,您將了解到具有彈性的分佈式數據集(RDD);使用Spark SQL處理結構化數據;學習流處理並使用Spark Structured Streaming構建實時應用程式。此外,您還將學習Spark ML的機器學習基礎知識等等。

閱讀本書後,您將具備使用Apache Spark的基礎知識,並且知道何時以及如何將其應用於大數據應用程式中。

您將學到什麼:

- 了解Spark統一數據處理平台
- 如何在Spark Shell或Databricks中運行Spark
- 使用和操作RDD
- 使用Spark SQL處理結構化數據,包括操作和高級函數
- 使用Spark Structured Streaming構建實時應用程式
- 使用Spark Machine Learning庫開發智能應用程式

本書適合對大數據、Hadoop和Java有經驗但對Apache Spark平台尚不熟悉的程序員和開發人員。