Learning Spark SQL
暫譯: 學習 Spark SQL
Aurobindo Sarkar
- 出版商: Packt Publishing
- 出版日期: 2017-09-04
- 售價: $2,420
- 貴賓價: 9.5 折 $2,299
- 語言: 英文
- 頁數: 452
- 裝訂: Paperback
- ISBN: 1785888358
- ISBN-13: 9781785888359
-
相關分類:
Spark、SQL
海外代購書籍(需單獨結帳)
商品描述
Design, implement, and deliver successful streaming applications, machine learning pipelines and graph applications using Spark SQL API
About This Book
- Learn about the design and implementation of streaming applications, machine learning pipelines, deep learning, and large-scale graph processing applications using Spark SQL APIs and Scala.
- Learn data exploration, data munging, and how to process structured and semi-structured data using real-world datasets and gain hands-on exposure to the issues and challenges of working with noisy and "dirty" real-world data.
- Understand design considerations for scalability and performance in web-scale Spark application architectures.
Who This Book Is For
If you are a developer, engineer, or an architect and want to learn how to use Apache Spark in a web-scale project, then this is the book for you. It is assumed that you have prior knowledge of SQL querying. A basic programming knowledge with Scala, Java, R, or Python is all you need to get started with this book.
What You Will Learn
- Familiarize yourself with Spark SQL programming, including working with DataFrame/Dataset API and SQL
- Perform a series of hands-on exercises with different types of data sources, including CSV, JSON, Avro, MySQL, and MongoDB
- Perform data quality checks, data visualization, and basic statistical analysis tasks
- Perform data munging tasks on publically available datasets
- Learn how to use Spark SQL and Apache Kafka to build streaming applications
- Learn key performance-tuning tips and tricks in Spark SQL applications
- Learn key architectural components and patterns in large-scale Spark SQL applications
In Detail
In the past year, Apache Spark has been increasingly adopted for the development of distributed applications. Spark SQL APIs provide an optimized interface that helps developers build such applications quickly and easily. However, designing web-scale production applications using Spark SQL APIs can be a complex task. Hence, understanding the design and implementation best practices before you start your project will help you avoid these problems.
This book gives an insight into the engineering practices used to design and build real-world, Spark-based applications. The book's hands-on examples will give you the required confidence to work on any future projects you encounter in Spark SQL.
It starts by familiarizing you with data exploration and data munging tasks using Spark SQL and Scala. Extensive code examples will help you understand the methods used to implement typical use-cases for various types of applications. You will get a walkthrough of the key concepts and terms that are common to streaming, machine learning, and graph applications. You will also learn key performance-tuning details including Cost Based Optimization (Spark 2.2) in Spark SQL applications. Finally, you will move on to learning how such systems are architected and deployed for a successful delivery of your project.
Style and approach
This book is a hands-on guide to designing, building, and deploying Spark SQL-centric production applications at scale.
商品描述(中文翻譯)
**設計、實作並交付成功的串流應用程式、機器學習管道及圖形應用程式,使用 Spark SQL API**
## 本書介紹
- 學習使用 Spark SQL API 和 Scala 設計與實作串流應用程式、機器學習管道、深度學習及大規模圖形處理應用程式。
- 學習數據探索、數據清理,以及如何使用真實世界數據集處理結構化和半結構化數據,並獲得處理噪音和「髒」數據的實務經驗。
- 了解在網路規模的 Spark 應用程式架構中,擴展性和性能的設計考量。
## 本書適合誰
如果您是開發人員、工程師或架構師,並希望學習如何在網路規模的專案中使用 Apache Spark,那麼這本書就是為您而寫。假設您已具備 SQL 查詢的基礎知識。您只需具備 Scala、Java、R 或 Python 的基本程式設計知識,即可開始閱讀本書。
## 您將學到什麼
- 熟悉 Spark SQL 程式設計,包括使用 DataFrame/Dataset API 和 SQL。
- 針對不同類型的數據來源(包括 CSV、JSON、Avro、MySQL 和 MongoDB)進行一系列的實作練習。
- 執行數據質量檢查、數據視覺化和基本統計分析任務。
- 在公開可用的數據集上執行數據清理任務。
- 學習如何使用 Spark SQL 和 Apache Kafka 建立串流應用程式。
- 學習 Spark SQL 應用程式中的關鍵性能調優技巧。
- 學習大規模 Spark SQL 應用程式中的關鍵架構組件和模式。
## 詳細內容
在過去的一年中,Apache Spark 在分散式應用程式的開發中被越來越多地採用。Spark SQL API 提供了一個優化的介面,幫助開發人員快速輕鬆地構建這類應用程式。然而,使用 Spark SQL API 設計網路規模的生產應用程式可能是一項複雜的任務。因此,在開始專案之前了解設計和實作的最佳實踐將幫助您避免這些問題。
本書深入探討了用於設計和構建真實世界 Spark 基礎應用程式的工程實踐。書中的實作範例將使您在未來的 Spark SQL 專案中獲得所需的信心。
本書首先讓您熟悉使用 Spark SQL 和 Scala 進行數據探索和數據清理任務。大量的程式碼範例將幫助您理解用於實作各類應用程式的典型用例的方法。您將了解串流、機器學習和圖形應用程式中常見的關鍵概念和術語。您還將學習 Spark SQL 應用程式中的關鍵性能調優細節,包括基於成本的優化(Spark 2.2)。最後,您將學習這些系統如何架構和部署,以成功交付您的專案。
## 風格與方法
本書是一本實作指南,旨在設計、構建和部署以 Spark SQL 為中心的生產應用程式,並能夠擴展到大規模。