Apache Spark 2.x Cookbook
Rishi Yadav
- 出版商: Packt Publishing
- 出版日期: 2017-05-31
- 定價: $1,650
- 售價: 8.0 折 $1,320
- 語言: 英文
- 頁數: 294
- 裝訂: Paperback
- ISBN: 1787127265
- ISBN-13: 9781787127265
-
相關分類:
Spark
立即出貨 (庫存=1)
買這商品的人也買了...
-
$1,910$1,815 -
$1,750$1,663 -
$2,380$2,261 -
$590$460 -
$2,144Deep Learning: Practical Neural Networks with Java
相關主題
商品描述
Key Features
- Contains recipes on solving real-time data-processing problems with Apache Spark
- Utilize core Spark modules such as Spark SQL, Spark MLlib, Spark Streaming, and GraphX processing
- A practical guide to help you master Apache Spark as your single big data computing platform
Book Description
While Apache Spark 1.x gained lot of traction and adoption in the early years, Spark 2.0 delivers very notable improvements in the areas of API, Performance, Structured Streaming, and simplifying building blocks to build better, faster, smarter, and accessible big data applications. This book uncovers all these features in the form of structured recipes to analyze and mature large and complex sets of data.
Starting with installing and configuring Apache Spark with various cluster managers, you will learn to set up development environments. Furthermore, you will be introduced to working with RDD's, Data Frames to operate on data with schemas, and real-time streaming with various sources such as Twitter Stream and Apache Kafka. You will also work through recipes on machine learning, including supervised learning, unsupervised learning, recommendation engines, deep learning algorithms, and GPU implementations on Spark.
Last but not the least, the final few chapters will help you delve more deeply into the concepts of graph processing using GraphX, securing your implementations, cluster optimization, and troubleshooting.
What you will learn
- Install and configure Apache Spark with various cluster managers
- Set up a development environment for Apache Spark
- Learn to operate on data in Spark with schemas
- Get to grips with real-time streaming analytics using Spark Streaming
- Master supervised learning and unsupervised learning using MLlib
- Build a recommendation engine using MLlib
- Use Tensorframes to manipulate Spark's DataFrames with TensorFlow programs for deep learning
- Develop a set of common applications or project types, and solutions that solve complex big data problems
商品描述(中文翻譯)
主要特點
- 包含使用Apache Spark解決實時數據處理問題的配方
- 利用核心Spark模塊,如Spark SQL、Spark MLlib、Spark Streaming和GraphX處理
- 實用指南,幫助您掌握Apache Spark作為您的單一大數據計算平台
書籍描述
雖然Apache Spark 1.x在早期獲得了很多關注和應用,但Spark 2.0在API、性能、結構化流和簡化構建塊等方面都有顯著的改進,以便構建更好、更快、更智能和更易於訪問的大數據應用。本書以結構化的配方形式揭示了所有這些功能,以分析和成熟大型和複雜的數據集。
從安裝和配置Apache Spark與各種集群管理器開始,您將學習如何設置開發環境。此外,您還將介紹使用RDD和數據框操作具有模式的數據,以及使用Twitter Stream和Apache Kafka等各種來源進行實時流式處理。您還將通過機器學習的配方進行工作,包括監督學習、無監督學習、推薦引擎、深度學習算法以及在Spark上的GPU實現。
最後幾章將幫助您更深入地了解使用GraphX進行圖形處理的概念,保護您的實現,集群優化和故障排除。
您將學到什麼
- 安裝和配置Apache Spark與各種集群管理器
- 為Apache Spark設置開發環境
- 學習在Spark中使用模式操作數據
- 掌握使用Spark Streaming進行實時流式分析
- 使用MLlib進行監督學習和無監督學習
- 使用MLlib構建推薦引擎
- 使用Tensorframes在深度學習中操作Spark的數據框
- 開發一組常見的應用程序或項目類型,以及解決複雜大數據問題的解決方案