Practical Hadoop Migration: How to Integrate Your RDBMS with the Hadoop Ecosystem and Re-Architect Relational Applications to NoSQL
Bhushan Lakhe
相關主題
商品描述
Re-architect relational applications to NoSQL, integrate relational database management systems with the Hadoop ecosystem, and transform and migrate relational data to and from Hadoop components. This book covers the best-practice design approaches to re-architecting your relational applications and transforming your relational data to optimize concurrency, security, denormalization, and performance.
Winner of IBM’s 2012 Gerstner Award for his implementation of big data and data warehouse initiatives and author of Practical Hadoop Security, author Bhushan Lakhe walks you through the entire transition process. First, he lays out the criteria for deciding what blend of re-architecting, migration, and integration between RDBMS and HDFS best meets your transition objectives. Then he demonstrates how to design your transition model.
Lakhe proceeds to cover the selection criteria for ETL tools, the implementation steps for migration with SQOOP- and Flume-based data transfers, and transition optimization techniques for tuning partitions, scheduling aggregations, and redesigning ETL. Finally, he assesses the pros and cons of data lakes and Lambda architecture as integrative solutions and illustrates their implementation with real-world case studies.
Hadoop/NoSQL solutions do not offer by default certain relational technology features such as role-based access control, locking for concurrent updates, and various tools for measuring and enhancing performance. Practical Hadoop Migration shows how to use open-source tools to emulate such relational functionalities in Hadoop ecosystem components.
What You'll Learn
- The requirements and design methodologies of relational data and NoSQL models
- How to decide whether you should migrate your relational applications to big data technologies or integrate them
- How to transition your relational applications to Hadoop/NoSQL platforms in terms of logical design and physical implementation
- RDBMS-to-HDFS integration, data transformation, and optimization techniques
- The situations in which Lambda architecture and data lake solutions should be considered
- How to select and implement Hadoop-based components and applications to speed transition, optimize integrated performance, and emulate relational functionalities
商品描述(中文翻譯)
重新架構關聯式應用程式以適應NoSQL,將關聯式資料庫管理系統與Hadoop生態系統整合,並將關聯式資料轉換和遷移至Hadoop元件。本書介紹了重新架構關聯式應用程式和轉換關聯式資料的最佳設計方法,以優化並行性、安全性、去正規化和效能。
作者Bhushan Lakhe是IBM 2012年Gerstner獎的得主,他實施了大數據和數據倉儲項目,並撰寫了《實用Hadoop安全》。他將引導您完成整個過渡過程。首先,他列出了決定重新架構、遷移和RDBMS與HDFS之間整合的混合方式的標準,然後演示如何設計過渡模型。
Lakhe接著介紹了ETL工具的選擇標準,基於SQOOP和Flume的數據傳輸的實施步驟,以及調整分區、安排聚合和重新設計ETL的過渡優化技術。最後,他評估了數據湖和Lambda架構作為整合解決方案的優缺點,並通過實際案例展示了它們的實施。
Hadoop/NoSQL解決方案預設不提供某些關聯式技術功能,例如基於角色的訪問控制、並發更新的鎖定以及用於測量和增強效能的各種工具。《實用Hadoop遷移》展示了如何使用開源工具在Hadoop生態系統元件中模擬這些關聯式功能。
本書的主要讀者群包括數據庫開發人員、數據庫管理員、企業架構師、Hadoop/NoSQL開發人員和IT領導者。次要讀者群包括項目和計劃經理以及數據庫和管理信息系統的高級學生。