Practical Hadoop Migration: How to Integrate Your RDBMS with the Hadoop Ecosystem and Re-Architect Relational Applications to NoSQL
暫譯: 實用的Hadoop遷移:如何將您的RDBMS與Hadoop生態系統整合並重新架構關聯應用程式至NoSQL
Bhushan Lakhe
相關主題
商品描述
Re-architect relational applications to NoSQL, integrate relational database management systems with the Hadoop ecosystem, and transform and migrate relational data to and from Hadoop components. This book covers the best-practice design approaches to re-architecting your relational applications and transforming your relational data to optimize concurrency, security, denormalization, and performance.
Winner of IBM’s 2012 Gerstner Award for his implementation of big data and data warehouse initiatives and author of Practical Hadoop Security, author Bhushan Lakhe walks you through the entire transition process. First, he lays out the criteria for deciding what blend of re-architecting, migration, and integration between RDBMS and HDFS best meets your transition objectives. Then he demonstrates how to design your transition model.
Lakhe proceeds to cover the selection criteria for ETL tools, the implementation steps for migration with SQOOP- and Flume-based data transfers, and transition optimization techniques for tuning partitions, scheduling aggregations, and redesigning ETL. Finally, he assesses the pros and cons of data lakes and Lambda architecture as integrative solutions and illustrates their implementation with real-world case studies.
Hadoop/NoSQL solutions do not offer by default certain relational technology features such as role-based access control, locking for concurrent updates, and various tools for measuring and enhancing performance. Practical Hadoop Migration shows how to use open-source tools to emulate such relational functionalities in Hadoop ecosystem components.
What You'll Learn
- The requirements and design methodologies of relational data and NoSQL models
- How to decide whether you should migrate your relational applications to big data technologies or integrate them
- How to transition your relational applications to Hadoop/NoSQL platforms in terms of logical design and physical implementation
- RDBMS-to-HDFS integration, data transformation, and optimization techniques
- The situations in which Lambda architecture and data lake solutions should be considered
- How to select and implement Hadoop-based components and applications to speed transition, optimize integrated performance, and emulate relational functionalities
商品描述(中文翻譯)
將關聯應用程式重新架構為 NoSQL,將關聯資料庫管理系統與 Hadoop 生態系統整合,並在 Hadoop 元件之間轉換和遷移關聯數據。本書涵蓋了重新架構關聯應用程式和轉換關聯數據的最佳實踐設計方法,以優化併發性、安全性、去規範化和性能。
本書作者 Bhushan Lakhe 獲得 IBM 2012 年的 Gerstner 獎,因其在大數據和數據倉庫計劃中的實施,並且是《Practical Hadoop Security》的作者,他將帶領您完成整個過渡過程。首先,他列出了決定 RDBMS 和 HDFS 之間最佳重新架構、遷移和整合組合的標準,以滿足您的過渡目標。然後,他展示了如何設計您的過渡模型。
Lakhe 接著涵蓋了 ETL 工具的選擇標準、使用 SQOOP 和 Flume 進行數據遷移的實施步驟,以及調整分區、排程聚合和重新設計 ETL 的過渡優化技術。最後,他評估了數據湖和 Lambda 架構作為整合解決方案的優缺點,並通過實際案例研究說明其實施。
Hadoop/NoSQL 解決方案預設不提供某些關聯技術功能,例如基於角色的存取控制、並發更新的鎖定以及各種測量和增強性能的工具。《Practical Hadoop Migration》展示了如何使用開源工具在 Hadoop 生態系統元件中模擬這些關聯功能。
您將學到什麼
- 關聯數據和 NoSQL 模型的需求和設計方法
- 如何決定是否應將關聯應用程式遷移到大數據技術或進行整合
- 如何在邏輯設計和物理實施方面將關聯應用程式過渡到 Hadoop/NoSQL 平台
- RDBMS 與 HDFS 的整合、數據轉換和優化技術
- 應考慮 Lambda 架構和數據湖解決方案的情況
- 如何選擇和實施基於 Hadoop 的元件和應用程式,以加速過渡、優化整合性能並模擬關聯功能