Learning Real Time processing with Spark Streaming
暫譯: 使用 Spark Streaming 學習即時處理
Sumit Gupta
- 出版商: Packt Publishing
- 出版日期: 2015-09-29
- 售價: $1,700
- 貴賓價: 9.5 折 $1,615
- 語言: 英文
- 頁數: 200
- 裝訂: Paperback
- ISBN: 1783987669
- ISBN-13: 9781783987665
-
相關分類:
Spark
海外代購書籍(需單獨結帳)
相關主題
商品描述
Building scalable and fault-tolerant streaming applications made easy with Spark streaming
About This Book
- Process live data streams more efficiently with better fault recovery using Spark Streaming
- Implement and deploy real-time log file analysis
- Learn about integration with Advance Spark Libraries – GraphX, Spark SQL, and MLib.
Who This Book Is For
This book is intended for big data developers with basic knowledge of Scala but no knowledge of Spark. It will help you grasp the basics of developing real-time applications with Spark and understand efficient programming of core elements and applications.
What You Will Learn
- Install and configure Spark and Spark Streaming to execute applications
- Explore the architecture and components of Spark and Spark Streaming to use it as a base for other libraries
- Process distributed log files in real-time to load data from distributed sources
- Apply transformations on streaming data to use its functions
- Integrate Apache Spark with the various advance libraries like MLib and GraphX
- Apply production deployment scenarios to deploy your application
In Detail
Using practical examples with easy-to-follow steps, this book will teach you how to build real-time applications with Spark Streaming.
Starting with installing and setting the required environment, you will write and execute your first program for Spark Streaming. This will be followed by exploring the architecture and components of Spark Streaming along with an overview of libraries/functions exposed by Spark. Next you will be taught about various client APIs for coding in Spark by using the use-case of distributed log file processing. You will then apply various functions to transform and enrich streaming data. Next you will learn how to cache and persist datasets. Moving on you will integrate Apache Spark with various other libraries/components of Spark like Mlib, GraphX, and Spark SQL. Finally, you will learn about deploying your application and cover the different scenarios ranging from standalone mode to distributed mode using Mesos, Yarn, and private data centers or on cloud infrastructure.
Style and approach
A Step-by-Step approach to learn Spark Streaming in a structured manner, with detailed explanation of basic and advance features in an easy-to-follow Style. Each topic is explained sequentially and supported with real world examples and executable code snippets that appeal to the needs of readers with the wide range of experiences.
商品描述(中文翻譯)
輕鬆構建可擴展且具容錯能力的串流應用程式,使用 Spark Streaming
本書簡介
- 使用 Spark Streaming 更有效地處理即時數據串流,並提高故障恢復能力
- 實現並部署即時日誌檔案分析
- 了解與進階 Spark 函式庫的整合 – GraphX、Spark SQL 和 MLib。
本書適合誰閱讀
本書適合具有基本 Scala 知識但對 Spark 不熟悉的大數據開發人員。它將幫助您掌握使用 Spark 開發即時應用程式的基礎,並理解核心元素和應用程式的高效編程。
您將學到什麼
- 安裝和配置 Spark 及 Spark Streaming 以執行應用程式
- 探索 Spark 和 Spark Streaming 的架構及組件,將其用作其他函式庫的基礎
- 即時處理分散式日誌檔案,從分散式來源加載數據
- 對串流數據應用轉換以使用其功能
- 將 Apache Spark 與各種進階函式庫(如 MLib 和 GraphX)整合
- 應用生產部署場景以部署您的應用程式
詳細內容
本書將通過實用範例和易於遵循的步驟,教您如何使用 Spark Streaming 構建即時應用程式。
從安裝和設置所需環境開始,您將編寫並執行您的第一個 Spark Streaming 程式。接下來,將探索 Spark Streaming 的架構和組件,以及 Spark 所提供的函式庫/功能概述。然後,您將學習使用分散式日誌檔案處理的案例來編碼 Spark 的各種客戶端 API。接著,您將應用各種函數來轉換和豐富串流數據。然後,您將學習如何快取和持久化數據集。接下來,您將將 Apache Spark 與 Spark 的其他各種函式庫/組件(如 Mlib、GraphX 和 Spark SQL)整合。最後,您將學習如何部署您的應用程式,並涵蓋從獨立模式到使用 Mesos、Yarn 和私有數據中心或雲基礎設施的分散式模式的不同場景。
風格與方法
以逐步的方法結構化學習 Spark Streaming,詳細解釋基本和進階功能,並以易於遵循的風格呈現。每個主題都按順序解釋,並以真實世界的範例和可執行的程式碼片段支持,以滿足不同經驗讀者的需求。