The Azure Data Lakehouse Toolkit: Building and Scaling Data Lakehouses with Delta Lake, Apache Spark, Azure Databricks and Synapse Analytics, and Snow
暫譯: Azure 數據湖屋工具包:使用 Delta Lake、Apache Spark、Azure Databricks、Synapse Analytics 和 Snow 建立與擴展數據湖屋
L'Esteve, Ron
- 出版商: Apress
- 出版日期: 2022-07-14
- 售價: $2,350
- 貴賓價: 9.5 折 $2,233
- 語言: 英文
- 頁數: 390
- 裝訂: Quality Paper - also called trade paper
- ISBN: 1484282329
- ISBN-13: 9781484282328
-
相關分類:
Microsoft Azure、Spark
海外代購書籍(需單獨結帳)
商品描述
Design and implement a modern data lakehouse on the Azure Data Platform using Delta Lake, Apache Spark, Azure Databricks, Azure Synapse Analytics, and Snowflake. This book teaches you the intricate details of the Data Lakehouse Paradigm and how to efficiently design a cloud-based data lakehouse using highly performant and cutting-edge Apache Spark capabilities using Azure Databricks, Azure Synapse Analytics, and Snowflake. You will learn to write efficient PySpark code for batch and streaming ELT jobs on Azure. And you will follow along with practical, scenario-based examples showing how to apply the capabilities of Delta Lake and Apache Spark to optimize performance, and secure, share, and manage a high volume, high velocity, and high variety of data in your lakehouse with ease.
The patterns of success that you acquire from reading this book will help you hone your skills to build high-performing and scalable ACID-compliant lakehouses using flexible and cost-efficient decoupled storage and compute capabilities. Extensive coverage of Delta Lake ensures that you are aware of and can benefit from all that this new, open source storage layer can offer. In addition to the deep examples on Databricks in the book, there is coverage of alternative platforms such as Synapse Analytics and Snowflake so that you can make the right platform choice for your needs.
After reading this book, you will be able to implement Delta Lake capabilities, including Schema Evolution, Change Feed, Live Tables, Sharing, and Clones to enable better business intelligence and advanced analytics on your data within the Azure Data Platform.
What You Will Learn
- Implement the Data Lakehouse Paradigm on Microsoft’s Azure cloud platform
- Benefit from the new Delta Lake open-source storage layer for data lakehouses
- Take advantage of schema evolution, change feeds, live tables, and more
- Write functional PySpark code for data lakehouse ELT jobs
- Optimize Apache Spark performance through partitioning, indexing, and other tuning options
- Choose between alternatives such as Databricks, Synapse Analytics, and Snowflake
Who This Book Is For
Data, analytics, and AI professionals at all levels, including data architect and data engineer practitioners. Also for data professionals seeking patterns of success by which to remain relevant as they learn to build scalable data lakehouses for their organizations and customers who are migrating into the modern Azure Data Platform.
商品描述(中文翻譯)
設計並實現一個基於 Azure 數據平台的現代數據湖屋,使用 Delta Lake、Apache Spark、Azure Databricks、Azure Synapse Analytics 和 Snowflake。本書教您數據湖屋範式的複雜細節,以及如何利用高效能和尖端的 Apache Spark 功能,使用 Azure Databricks、Azure Synapse Analytics 和 Snowflake 有效地設計基於雲的數據湖屋。您將學會在 Azure 上為批次和串流 ELT 工作編寫高效的 PySpark 代碼。您還將跟隨實用的情境範例,展示如何應用 Delta Lake 和 Apache Spark 的功能,以優化性能,並輕鬆地安全、共享和管理高容量、高速度和高多樣性的數據。
您從閱讀本書中獲得的成功模式將幫助您磨練技能,構建高效能和可擴展的 ACID 兼容湖屋,使用靈活且具成本效益的解耦存儲和計算能力。對 Delta Lake 的廣泛覆蓋確保您了解並能夠受益於這個新的開源存儲層所提供的一切。除了書中對 Databricks 的深入範例外,還涵蓋了 Synapse Analytics 和 Snowflake 等替代平台,以便您能夠根據需求做出正確的平台選擇。
閱讀完本書後,您將能夠實現 Delta Lake 的功能,包括模式演變、變更訂閱、即時表、共享和克隆,以便在 Azure 數據平台上實現更好的商業智能和高級分析。
您將學到的內容:
- 在微軟的 Azure 雲平台上實現數據湖屋範式
- 利用新的 Delta Lake 開源存儲層為數據湖屋帶來好處
- 利用模式演變、變更訂閱、即時表等功能
- 為數據湖屋的 ELT 工作編寫功能性 PySpark 代碼
- 通過分區、索引和其他調優選項優化 Apache Spark 性能
- 在 Databricks、Synapse Analytics 和 Snowflake 等替代方案之間進行選擇
本書適合對象:
所有級別的數據、分析和人工智慧專業人士,包括數據架構師和數據工程師實踐者。也適合尋求成功模式的數據專業人士,以便在學習為其組織和客戶構建可擴展的數據湖屋時保持相關性,特別是那些正在遷移到現代 Azure 數據平台的專業人士。
作者簡介
Ron C. L’Esteve is a professional author, trusted technology leader, and digital innovation strategist residing in Chicago, IL, USA. He is well-known for his impactful books and award-winning article publications about Azure Data & AI Architecture and Engineering. He possesses deep technical skills and experience in designing, implementing, and delivering modern Azure Data & AI projects for numerous clients around the world.
Having several Azure Data, AI, and Lakehouse certifications under his belt, Ron has been a go-to technical advisor for some of the largest and most impactful Azure implementation projects on the planet. He has been responsible for scaling key data architectures, defining the road map and strategy for the future of data and business intelligence needs, and challenging customers to grow by thoroughly understanding the fluid business opportunities and enabling change by translating them into high-quality and sustainable technical solutions that solve the most complex challenges and promote digital innovation and transformation.
Ron is a gifted presenter and trainer, known for his innate ability to clearly articulate and explain complex topics to audiences of all skill levels. He applies a practical and business-oriented approach by taking transformational ideas from concept to scale. He is a true enabler of positive and impactful change by championing a growth mindset.
作者簡介(中文翻譯)
Ron C. L’Esteve 是一位專業作家、受信任的技術領導者以及數位創新策略師,居住在美國伊利諾伊州的芝加哥。他以其對 Azure 數據與人工智慧架構及工程的影響力書籍和獲獎文章而聞名。他擁有深厚的技術技能和經驗,為全球眾多客戶設計、實施和交付現代 Azure 數據與人工智慧專案。
擁有多項 Azure 數據、人工智慧和 Lakehouse 認證的 Ron,已成為全球一些最大且最具影響力的 Azure 實施專案的首選技術顧問。他負責擴展關鍵數據架構,定義未來數據和商業智慧需求的路線圖和策略,並挑戰客戶通過徹底理解流動的商業機會來成長,並將這些機會轉化為高品質且可持續的技術解決方案,以解決最複雜的挑戰,促進數位創新和轉型。
Ron 是一位天賦異稟的演講者和培訓師,以其清晰表達和解釋複雜主題的天賦而聞名,能夠面對各種技能水平的觀眾。他採用實用且以商業為導向的方法,將轉型理念從概念推向規模。他是真正促進積極且有影響力變革的推動者,倡導成長心態。