Scala and Spark for Big Data Analytics
暫譯: Scala 與 Spark 在大數據分析中的應用
Md. Rezaul Karim, Sridhar Alla
- 出版商: Packt Publishing
- 出版日期: 2017-07-22
- 售價: $2,820
- 貴賓價: 9.5 折 $2,679
- 語言: 英文
- 頁數: 796
- 裝訂: Paperback
- ISBN: 1785280848
- ISBN-13: 9781785280849
-
相關分類:
JVM 語言、Spark、大數據 Big-data、Data Science
-
相關翻譯:
Scala和Spark大數據分析 函數式編程、數據流和機器學習 (簡中版)
商品描述
Key Features
- Learn Scala's sophisticated type system that combines Functional Programming and object-oriented concepts
- Work on a wide array of applications from simple batch jobs to stream processing and machine learning
- Explore the most common as well as some complex use-cases to perform large-scale data analysis with Spark
Book Description
Scala has been observing a steady rise in adoption over the past few years, especially in the field of data science and analytics. Going hand in hand with Scala, is Apache Spark, which is built on Scala, and is widely used in the field of Analytics.
If you want to leverage the power of both, Scala and Spark, to make sense of Big Data, then this book is for you.
This book is divided into three parts. In the first part, it will introduce you to Scala programming, helping you understand its fundamentals and be able to program with Spark. It will then move on to introducing you to Spark and the design choices beneath it and show you how to perform data analysis with it. Finally to shake things up, the book moves onto Advanced Spark and teach you advanced topics, like monitoring, configuration, debugging, testing and finally deployment.
By the end of this book, you will be able to perform full stack data analysis with Spark and feel that no amount of data is too big.
What you will learn
- Understand the basics of Scala and explore Functional programming.
- Get familiar with Collections API, one of the most prominent features of the standard library.
- Work with RDDs, the basic abstractions behind Apache Spark.
- Use Spark for the analysis of structured and unstructured data and work with SparkSQL's APIs.
- Take advantage of Spark for the analysis of streaming data and explore interoperability with streaming software like Apache Kafka.
- Use common Machine Learning techniques like Dimensionality Reduction and One Hot Encoding and build a predictive model using Spark.
- Use Bayesian inference to build another kind of classification model and understand when the Decision Tree algorithm should be used.
- Build a Clustering model and use it to make predictions.
- Tune your application and use Spark Testing Base.
- Deploy a full Spark application on a cluster using Mesos.
商品描述(中文翻譯)
#### 主要特點
- 學習 Scala 的複雜類型系統,結合了函數式編程和物件導向概念
- 處理各種應用,從簡單的批次作業到串流處理和機器學習
- 探索最常見以及一些複雜的使用案例,以使用 Spark 進行大規模數據分析
#### 書籍描述
Scala 在過去幾年中逐漸受到廣泛採用,特別是在數據科學和分析領域。與 Scala 密切相關的是 Apache Spark,這是一個基於 Scala 的框架,廣泛應用於分析領域。
如果您想利用 Scala 和 Spark 的力量來理解大數據,那麼這本書就是為您而寫的。
本書分為三個部分。第一部分將介紹 Scala 編程,幫助您理解其基本原理並能夠使用 Spark 進行編程。接下來將介紹 Spark 及其設計選擇,並展示如何使用它進行數據分析。最後,為了增加趣味性,本書將進入高級 Spark,教授您高級主題,如監控、配置、除錯、測試以及最終的部署。
在本書結束時,您將能夠使用 Spark 進行全堆疊數據分析,並感受到沒有任何數據是過於龐大的。
#### 您將學到的內容
- 理解 Scala 的基本概念並探索函數式編程。
- 熟悉集合 API,這是標準庫中最突出的特徵之一。
- 使用 RDD,這是 Apache Spark 背後的基本抽象。
- 使用 Spark 進行結構化和非結構化數據的分析,並使用 SparkSQL 的 API。
- 利用 Spark 進行串流數據的分析,並探索與串流軟體如 Apache Kafka 的互操作性。
- 使用常見的機器學習技術,如降維和獨熱編碼,並使用 Spark 建立預測模型。
- 使用貝葉斯推斷建立另一種分類模型,並理解何時應使用決策樹算法。
- 建立聚類模型並使用它進行預測。
- 調整您的應用程式並使用 Spark Testing Base。
- 在集群上使用 Mesos 部署完整的 Spark 應用程式。