High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark (Paperback)
暫譯: 高效能 Spark:擴展與優化 Apache Spark 的最佳實踐 (平裝本)
Holden Karau, Rachel Warren
買這商品的人也買了...
-
$1,232RESTful Web Services Cookbook: Solutions for Improving Scalability and Simplicity (Paperback)
-
$1,650$1,568 -
$1,220$1,159 -
$940$700 -
$1,568Spark: Big Data Cluster Computing in Production (Paperback)
-
$520$442 -
$2,030$1,929 -
$990Spark in Action
-
$1,892Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems (Paperback)
-
$948Scala for the Impatient,2/e
-
$882Learning Concurrent Programming in Scala - Second Edition
-
$790$616 -
$580$452 -
$352程序員面試筆試寶典(第3版)
-
$505Spark 全棧數據分析
-
$880$695 -
$1,650$1,617 -
$1,805$1,710 -
$599$473 -
$505實戰大數據 (Hadoop + Spark + Flink) 從平台構建到交互式數據分析 (離線/實時)
-
$2,024Learning Domain-Driven Design: Aligning Software Architecture and Business Strategy (Paperback)
-
$254大數據技術入門 — Hadoop + Spark
-
$560圖解 Spark 大數據快速分析實戰
-
$1,128$1,072 -
$780$663
相關主題
商品描述
Apache Spark is amazing when everything clicks. But if you haven’t seen the performance improvements you expected, or still don’t feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run faster and handle larger data sizes, while using fewer resources.
Ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications, this book describes techniques that can reduce data infrastructure costs and developer hours. Not only will you gain a more comprehensive understanding of Spark, you’ll also learn how to make it sing.
With this book, you’ll explore:
- How Spark SQL’s new interfaces improve performance over SQL’s RDD data structure
- The choice between data joins in Core Spark and Spark SQL
- Techniques for getting the most out of standard RDD transformations
- How to work around performance issues in Spark’s key/value pair paradigm
- Writing high-performance Spark code without Scala or the JVM
- How to test for functionality and performance when applying suggested improvements
- Using Spark MLlib and Spark ML machine learning libraries
- Spark’s Streaming components and external community packages
商品描述(中文翻譯)
Apache Spark 在一切運行順利時是非常出色的。但如果您還沒有看到預期的性能提升,或者仍然對在生產環境中使用 Spark 感到不夠自信,那麼這本實用的書籍就是為您而寫的。作者 Holden Karau 和 Rachel Warren 展示了性能優化技術,幫助您的 Spark 查詢運行得更快,處理更大的數據量,同時使用更少的資源。
這本書非常適合從事大規模數據應用的軟體工程師、數據工程師、開發人員和系統管理員,描述了可以降低數據基礎設施成本和開發人員工時的技術。您不僅會對 Spark 有更全面的理解,還會學會如何讓它發揮最佳效能。
在這本書中,您將探索:
- Spark SQL 的新介面如何改善相較於 SQL 的 RDD 數據結構的性能
- Core Spark 和 Spark SQL 中數據聯接的選擇
- 獲取標準 RDD 轉換最大效益的技術
- 如何解決 Spark 的鍵/值對範式中的性能問題
- 如何在不使用 Scala 或 JVM 的情況下編寫高性能的 Spark 代碼
- 在應用建議的改進時如何測試功能和性能
- 使用 Spark MLlib 和 Spark ML 機器學習庫
- Spark 的 Streaming 組件和外部社群套件