Big Data Analytics with Spark and Hadoop
暫譯: 使用 Spark 和 Hadoop 的大數據分析
Venkat Ankam
- 出版商: Packt Publishing
- 出版日期: 2016-09-29
- 定價: $1,600
- 售價: 6.0 折 $960
- 語言: 英文
- 頁數: 326
- 裝訂: Paperback
- ISBN: 1785884697
- ISBN-13: 9781785884696
-
相關分類:
Hadoop、Spark、大數據 Big-data、Data Science
-
相關翻譯:
Spark與Hadoop大數據分析 (Big Data Analytics) (簡中版)
立即出貨 (庫存=1)
買這商品的人也買了...
-
$550$468 -
$780$663 -
$590$460 -
$450$351 -
$780$616 -
$690$545 -
$780$616 -
$360$284 -
$450$383 -
$560$437 -
$420$332 -
$490$382 -
$520$411 -
$594$564 -
$580$493 -
$580$452 -
$580$458 -
$540$427 -
$490$417 -
$580$383 -
$420$357 -
$420$332 -
$450$356 -
$280$221 -
$550$363
相關主題
商品描述
Key Features
- This book is based on the latest 2.0 version of Apache Spark and 2.7 version of Hadoop integrated with most commonly used tools.
- Learn all Spark stack components including latest topics such as DataFrames, DataSets, GraphFrames, Structured Streaming, DataFrame based ML Pipelines and SparkR.
- Integrations with frameworks such as HDFS, YARN and tools such as Jupyter, Zeppelin, NiFi, Mahout, HBase Spark Connector, GraphFrames, H2O and Hivemall.
Book Description
Big Data Analytics book aims at providing the fundamentals of Apache Spark and Hadoop. All Spark components Spark Core, Spark SQL, DataFrames, Data sets, Conventional Streaming, Structured Streaming, MLlib, Graphx and Hadoop core components HDFS, MapReduce and Yarn are explored in greater depth with implementation examples on Spark + Hadoop clusters.
It is moving away from MapReduce to Spark. So, advantages of Spark over MapReduce are explained at great depth to reap benefits of in-memory speeds. DataFrames API, Data Sources API and new Data set API are explained for building Big Data analytical applications. Real-time data analytics using Spark Streaming with Apache Kafka and HBase is covered to help building streaming applications. New Structured streaming concept is explained with an IOT (Internet of Things) use case. Machine learning techniques are covered using MLLib, ML Pipelines and SparkR and Graph Analytics are covered with GraphX and GraphFrames components of Spark.
Readers will also get an opportunity to get started with web based notebooks such as Jupyter, Apache Zeppelin and data flow tool Apache NiFi to analyze and visualize data.
What you will learn
- Find out and implement the tools and techniques of big data analytics using Spark on Hadoop clusters with wide variety of tools used with Spark and Hadoop
- Understand all the Hadoop and Spark ecosystem components
- Get to know all the Spark components: Spark Core, Spark SQL, DataFrames, DataSets, Conventional and Structured Streaming, MLLib, ML Pipelines and Graphx
- See batch and real-time data analytics using Spark Core, Spark SQL, and Conventional and Structured Streaming
- Get to grips with data science and machine learning using MLLib, ML Pipelines, H2O, Hivemall, Graphx, SparkR and Hivemall.
About the Author
Venkat Ankam has over 18 years of IT experience and over 5 years in big data technologies, working with customers to design and develop scalable big data applications. Having worked with multiple clients globally, he has tremendous experience in big data analytics using Hadoop and Spark.
He is a Cloudera Certified Hadoop Developer and Administrator and also a Databricks Certified Spark Developer. He is the founder and presenter of a few Hadoop and Spark meetup groups globally and loves to share knowledge with the community.
Venkat has delivered hundreds of trainings, presentations, and white papers in the big data sphere. While this is his first attempt at writing a book, many more books are in the pipeline.
Table of Contents
- Big Data Analytics at 10,000 foot view
- Getting Started with Apache Hadoop and Apache Spark
- Deep Dive into Apache Spark
- Big Data Analytics with Spark SQL, DataFrames, and Datasets
- Real-Time Analytics with Spark Streaming and Structured Streaming
- Notebooks and Dataflows with Spark and Hadoop
- Machine Learning with Spark and Hadoop
- Building Recommendation Systems with Spark and Mahout
- Graph Analytics with GraphX
- Interactive Analytics with SparkR
商品描述(中文翻譯)
**主要特點**
- 本書基於最新的 Apache Spark 2.0 版本和 Hadoop 2.7 版本,並整合了最常用的工具。
- 學習所有 Spark 堆疊組件,包括最新主題,如 DataFrames、DataSets、GraphFrames、結構化流處理、基於 DataFrame 的機器學習管道和 SparkR。
- 與 HDFS、YARN 等框架及 Jupyter、Zeppelin、NiFi、Mahout、HBase Spark 連接器、GraphFrames、H2O 和 Hivemall 等工具的整合。
**書籍描述**
《大數據分析》一書旨在提供 Apache Spark 和 Hadoop 的基本原理。所有 Spark 組件,包括 Spark Core、Spark SQL、DataFrames、DataSets、傳統流處理、結構化流處理、MLlib、GraphX 以及 Hadoop 核心組件 HDFS、MapReduce 和 YARN,都將深入探討,並提供在 Spark + Hadoop 集群上的實作範例。
本書將重點從 MapReduce 轉向 Spark,因此將深入解釋 Spark 相對於 MapReduce 的優勢,以獲得內存速度的好處。DataFrames API、Data Sources API 和新的 DataSet API 將被解釋,以便構建大數據分析應用程式。使用 Spark Streaming 與 Apache Kafka 和 HBase 進行實時數據分析的內容將幫助讀者構建流式應用程式。新的結構化流處理概念將通過物聯網 (IoT) 的使用案例進行解釋。機器學習技術將使用 MLLib、ML 管道和 SparkR 進行介紹,Graph Analytics 將通過 Spark 的 GraphX 和 GraphFrames 組件進行探討。
讀者還將有機會開始使用基於網頁的筆記本,如 Jupyter、Apache Zeppelin 和數據流工具 Apache NiFi 來分析和可視化數據。
**您將學到的內容**
- 發現並實施使用 Spark 在 Hadoop 集群上進行大數據分析的工具和技術,並使用各種與 Spark 和 Hadoop 相關的工具。
- 理解所有 Hadoop 和 Spark 生態系統組件。
- 了解所有 Spark 組件:Spark Core、Spark SQL、DataFrames、DataSets、傳統流處理和結構化流處理、MLlib、ML 管道和 GraphX。
- 使用 Spark Core、Spark SQL 以及傳統和結構化流處理進行批量和實時數據分析。
- 熟悉使用 MLLib、ML 管道、H2O、Hivemall、GraphX、SparkR 和 Hivemall 進行數據科學和機器學習。
**關於作者**
**Venkat Ankam** 擁有超過 18 年的 IT 經驗和超過 5 年的大數據技術經驗,與客戶合作設計和開發可擴展的大數據應用程式。Venkat 曾與多個全球客戶合作,對使用 Hadoop 和 Spark 的大數據分析擁有豐富的經驗。
他是 Cloudera 認證的 Hadoop 開發者和管理員,也是 Databricks 認證的 Spark 開發者。他是幾個全球 Hadoop 和 Spark 交流小組的創始人和演講者,熱衷於與社群分享知識。
Venkat 在大數據領域提供了數百場培訓、演講和白皮書。雖然這是他首次嘗試撰寫書籍,但他還有許多書籍在籌備中。
**目錄**
1. 大數據分析的全景觀
2. 開始使用 Apache Hadoop 和 Apache Spark
3. 深入探討 Apache Spark
4. 使用 Spark SQL、DataFrames 和 DataSets 進行大數據分析
5. 使用 Spark Streaming 和結構化流處理進行實時分析
6. 使用 Spark 和 Hadoop 的筆記本和數據流
7. 使用 Spark 和 Hadoop 進行機器學習
8. 使用 Spark 和 Mahout 構建推薦系統
9. 使用 GraphX 進行圖形分析
10. 使用 SparkR 進行互動分析