Apache Spark 2.x for Java Developers: Explore big data at scale using Apache Spark 2.x Java APIs
Sourav Gulati, Sumit Kumar
- 出版商: Packt Publishing
- 出版日期: 2017-07-27
- 售價: $2,150
- 貴賓價: 9.5 折 $2,043
- 語言: 英文
- 頁數: 350
- 裝訂: Paperback
- ISBN: 1787126498
- ISBN-13: 9781787126497
-
相關分類:
Java 程式語言、Spark
海外代購書籍(需單獨結帳)
買這商品的人也買了...
-
$480$379 -
$880$695 -
$990Java: The Complete Reference, 9/e (Paperback)
-
$550$435 -
$420$332 -
$950$950 -
$800Java Deep Learning Essentials (Paperback)
-
$650$553 -
$520$442 -
$680$530 -
$580$458 -
$650$553 -
$2,150$2,043 -
$1,320Mastering Java for Data Science
-
$450$356 -
$2,050$1,948 -
$500$395 -
$2,350$2,233 -
$1,320Mastering Apache Spark 2.x - Second Edition
-
$2,150$2,043 -
$580$458 -
$480$408 -
$430$387 -
$780$616 -
$960Scala Programming for Big Data Analytics: Get Started with Big Data Analytics Using Apache Spark
相關主題
商品描述
Key Features
- Perform Big Data processing with Spark-without having to learn Scala!
- Use the Spark Java API to implement efficient enterprise-grade applications for data processing and analytics
- Go beyond the mainstream data processing by adding querying capability, machine learning, and graph processing using Spark
Book Description
Apache Spark is the buzzword in the Big Data industry right now, especially with the increasing need for real-time streaming and data processing. While Spark is built on Scala, the Spark Java API exposes all the Spark features available in the Scala version for Java developers. This book will show you how you can implement various functionalities of the Apache Spark framework in Java, without stepping out of your comfort zone.
The book starts with introduction to the Apache Spark ecosystem, followed by explaining the Spark installation and configuration, and refreshes the Java concepts that will be useful to you when consuming Apache Spark's APIs. You will explore RDD and its associated common Action and Transformation Java APIs, set up a production-like clustered environment, and work with Spark SQL. Moving on, you will perform near real-time processing with Spark streaming, machine learning analytics with Spark MLlib, and graph processing with GraphX using the various Java packages.
By the end of the book, you will have a solid foundation in implementing the components in the Spark framework in Java to build fast, real-time applications
What you will learn
- Process data using different file formats such as XML, JSON, CSV, and plain and delimited text using Spark core Library
- Perform analytics on data from various data sources such as Kafka, Flume, and Twitter using Spark Streaming Library
- Learn SQL schema creation and analysis of structured data using various SQL functions including Windowing functions of Spark SQL Library
- Explore the Spark Mlib APIs while implementing machine learning techniques to solve real-world problems
- Get to know Spark GraphX so you understand various Graph-based analytics that can be performed with Spark
商品描述(中文翻譯)
主要特點:
- 使用Spark進行大數據處理,無需學習Scala!
- 使用Spark Java API實現高效的企業級數據處理和分析應用程序
- 通過添加Spark的查詢能力、機器學習和圖形處理,超越主流數據處理
書籍描述:
Apache Spark是當今大數據行業的熱門詞語,特別是隨著對實時流數據和數據處理的需求不斷增加。雖然Spark是基於Scala開發的,但Spark Java API為Java開發人員提供了與Scala版本中的所有Spark功能相同的接口。本書將向您展示如何在Java中實現Apache Spark框架的各種功能,而無需離開您的舒適區。
本書首先介紹了Apache Spark生態系統,然後解釋了Spark的安裝和配置,並複習了在使用Apache Spark的API時對您有用的Java概念。您將探索RDD及其相關的常見Action和Transformation Java API,建立類似生產環境的集群環境,並使用Spark SQL進行工作。接下來,您將使用Spark流進行近實時處理,使用Spark MLlib進行機器學習分析,並使用各種Java包進行圖形處理。
通過閱讀本書,您將在Java中實現Spark框架的各個組件方面建立堅實的基礎,以構建快速的實時應用程序。
學到的內容:
- 使用Spark核心庫處理不同文件格式(如XML、JSON、CSV和純文本)的數據
- 使用Spark流庫從Kafka、Flume和Twitter等各種數據源進行數據分析
- 學習使用Spark SQL庫創建SQL模式和分析結構化數據,包括Spark SQL庫的窗口函數
- 在解決現實世界問題時,使用Spark MLlib API實現機器學習技術
- 了解Spark GraphX,以便了解可以使用Spark執行的各種基於圖形的分析方法