Beginning Apache Spark 3: With Dataframe, Spark Sql, Structured Streaming, and Spark Machine Learning Library
暫譯: Apache Spark 3 入門:使用 DataFrame、Spark SQL、結構化串流及 Spark 機器學習庫
Luu, Hien
- 出版商: Apress
- 出版日期: 2021-10-23
- 售價: $2,520
- 貴賓價: 9.5 折 $2,394
- 語言: 英文
- 頁數: 390
- 裝訂: Quality Paper - also called trade paper
- ISBN: 1484273826
- ISBN-13: 9781484273821
-
相關分類:
Spark、SQL、Machine Learning
海外代購書籍(需單獨結帳)
商品描述
Take a journey toward discovering, learning, and using Apache Spark 3.0. In this book, you will gain expertise on the powerful and efficient distributed data processing engine inside of Apache Spark; its user-friendly, comprehensive, and flexible programming model for processing data in batch and streaming; and the scalable machine learning algorithms and practical utilities to build machine learning applications.
Beginning Apache Spark 3 begins by explaining different ways of interacting with Apache Spark, such as Spark Concepts and Architecture, and Spark Unified Stack. Next, it offers an overview of Spark SQL before moving on to its advanced features. It covers tips and techniques for dealing with performance issues, followed by an overview of the structured streaming processing engine. It concludes with a demonstration of how to develop machine learning applications using Spark MLlib and how to manage the machine learning development lifecycle. This book is packed with practical examples and code snippets to help you master concepts and features immediately after they are covered in each section.
After reading this book, you will have the knowledge required to build your own big data pipelines, applications, and machine learning applications.
What You Will Learn
- Master the Spark unified data analytics engine and its various components
- Work in tandem to provide a scalable, fault tolerant and performant data processing engine
- Leverage the user-friendly and flexible programming model to perform simple to complex data analytics using dataframe and Spark SQL
- Develop machine learning applications using Spark MLlib
- Manage the machine learning development lifecycle using MLflow
Who This Book Is For
Data scientists, data engineers and software developers.
商品描述(中文翻譯)
探索、學習和使用 Apache Spark 3.0 的旅程。在這本書中,您將獲得有關 Apache Spark 內部強大且高效的分散式數據處理引擎的專業知識;其用戶友好、全面且靈活的編程模型,用於批量和流式數據處理;以及可擴展的機器學習算法和實用工具,以構建機器學習應用程序。
《Beginning Apache Spark 3》首先解釋與 Apache Spark 互動的不同方式,例如 Spark 概念和架構,以及 Spark 統一堆疊。接下來,它提供了 Spark SQL 的概述,然後進入其高級功能。它涵蓋了處理性能問題的提示和技巧,接著概述了結構化流處理引擎。最後,展示了如何使用 Spark MLlib 開發機器學習應用程序以及如何管理機器學習開發生命周期。本書充滿了實用的範例和代碼片段,幫助您在每個部分涵蓋後立即掌握概念和功能。
閱讀完這本書後,您將具備構建自己的大數據管道、應用程序和機器學習應用程序所需的知識。
您將學到的內容:
- 精通 Spark 統一數據分析引擎及其各種組件
- 協同工作以提供可擴展、容錯和高效的數據處理引擎
- 利用用戶友好且靈活的編程模型,使用 dataframe 和 Spark SQL 執行從簡單到複雜的數據分析
- 使用 Spark MLlib 開發機器學習應用程序
- 使用 MLflow 管理機器學習開發生命周期
本書適合對象:
數據科學家、數據工程師和軟體開發人員。
作者簡介
作者簡介(中文翻譯)
Hien Luu 在設計和建構大數據應用程式及機器學習基礎設施方面擁有豐富的經驗。他特別熱衷於大數據與機器學習之間的交集。Hien 喜歡使用開源軟體,並曾對 Apache Pig 和 Azkaban 做出貢獻。教學也是他的熱情之一,他在 UCSC 硅谷延伸學校擔任講師,教授 Apache Spark。他曾在多個會議上發表演講,如 Data+AI Summit、MLOps World、QCon SF、QCon London、Hadoop Summit 和 JavaOne。