Beginning Apache Pig: Big Data Processing Made Easy
暫譯: Apache Pig 入門:簡化大數據處理
Balaswamy Vaddeman
- 出版商: Apress
- 出版日期: 2016-12-16
- 售價: $1,780
- 貴賓價: 9.5 折 $1,691
- 語言: 英文
- 頁數: 274
- 裝訂: Paperback
- ISBN: 1484223365
- ISBN-13: 9781484223369
-
相關分類:
大數據 Big-data
海外代購書籍(需單獨結帳)
商品描述
Learn to use Apache Pig to develop lightweight big data applications easily and quickly. This book shows you many optimization techniques and covers every context where Pig is used in big data analytics. Beginning Apache Pig shows you how Pig is easy to learn and requires relatively little time to develop big data applications.
The book is divided into four parts: the complete features of Apache Pig; integration with other tools; how to solve complex business problems; and optimization of tools.
You'll discover topics such as MapReduce and why it cannot meet every business need; the features of Pig Latin such as data types for each load, store, joins, groups, and ordering; how Pig workflows can be created; submitting Pig jobs using Hue; and working with Oozie. You'll also see how to extend the framework by writing UDFs and custom load, store, and filter functions. Finally you'll cover different optimization techniques such as gathering statistics about a Pig script, joining strategies, parallelism, and the role of data formats in good performance.What You Will Learn
• Use all the features of Apache Pig
• Integrate Apache Pig with other tools
• Extend Apache Pig
• Optimize Pig Latin code
• Solve different use cases for Pig Latin
Who This Book Is For
All levels of IT professionals: architects, big data enthusiasts, engineers, developers, and big data administrators
商品描述(中文翻譯)
學習使用 Apache Pig 來輕鬆快速地開發輕量級的大數據應用程式。本書展示了許多優化技術,並涵蓋了 Pig 在大數據分析中使用的每一個上下文。《Beginning Apache Pig》向您展示了 Pig 的學習簡單,並且開發大數據應用程式所需的時間相對較少。
本書分為四個部分:Apache Pig 的完整功能;與其他工具的整合;如何解決複雜的商業問題;以及工具的優化。
您將發現主題,例如 MapReduce 及其為何無法滿足每一個商業需求;Pig Latin 的特性,例如每個載入、儲存、聯接、分組和排序的數據類型;如何創建 Pig 工作流程;使用 Hue 提交 Pig 作業;以及與 Oozie 的協作。您還將看到如何通過編寫 UDF(用戶自定義函數)和自定義的載入、儲存和過濾函數來擴展框架。最後,您將涵蓋不同的優化技術,例如收集 Pig 腳本的統計信息、聯接策略、並行性以及數據格式在良好性能中的角色。
您將學到的內容:
• 使用 Apache Pig 的所有功能
• 將 Apache Pig 與其他工具整合
• 擴展 Apache Pig
• 優化 Pig Latin 代碼
• 解決 Pig Latin 的不同使用案例
本書適合對象:
所有級別的 IT 專業人員:架構師、大數據愛好者、工程師、開發人員和大數據管理員