Practical Implementation of a Data Lake: Translating Customer Expectations Into Tangible Technical Goals
暫譯: 數據湖的實務實現:將客戶期望轉化為具體的技術目標

Paul, Nayanjyoti

  • 出版商: Apress
  • 出版日期: 2023-10-04
  • 售價: $1,270
  • 貴賓價: 9.5$1,207
  • 語言: 英文
  • 頁數: 202
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1484297342
  • ISBN-13: 9781484297346
  • 相關分類: 大數據 Big-data資料庫Data Science
  • 海外代購書籍(需單獨結帳)

商品描述

This book explains how to implement a data lake strategy, covering the technical and business challenges architects commonly face. It also illustrates how and why client requirements should drive architectural decisions.

Drawing upon a specific case from his own experience, author Nayanjyoti Paul begins with the consideration from which all subsequent decisions should flow: what does your customer need? He also describes the importance of identifying key stakeholders and the key points to focus on when starting a new project. Next, he takes you through the business and technical requirement-gathering process, and how to translate customer expectations into tangible technical goals. From there, you'll gain insight into the security model that will allow you to establish security and legal guardrails, as well as different aspects of security from the end user's perspective. You'll learn which organizational roles need to be onboarded into the data lake, their responsibilities, the services they need access to, and how the hierarchy of escalations should work. Subsequent chapters explore how to divide your data lakes into zones, organize data for security and access, manage data sensitivity, and techniques used for data obfuscation. Audit and logging capabilities in the data lake are also covered before a deep dive into designing data lakes to handle multiple kinds and file formats and access patterns. The book concludes by focusing on production operationalization and solutions to implement a production setup.

After completing this book, you will understand how to implement a data lake, the best practices to employ while doing so, and will be armed with practical tips to solve business problems.

What You Will Learn

  • Understand the challenges associated with implementing a data lake
  • Explore the architectural patterns and processes used to design a new data lake
  • Design and implement data lake capabilities
  • Associate business requirements with technical deliverables to drive success

Who This Book Is For

Data Scientists and Architects, Machine Learning Engineers, and Software Engineers.

商品描述(中文翻譯)

這本書解釋了如何實施數據湖策略,涵蓋了架構師常面臨的技術和商業挑戰。它還說明了客戶需求如何驅動架構決策的原因和方式。

作者 Nayanjyoti Paul 根據自己的一個具體案例開始,考慮所有後續決策應該基於的問題:你的客戶需要什麼?他還描述了識別關鍵利益相關者的重要性,以及在啟動新項目時需要關注的重點。接下來,他帶你了解商業和技術需求收集的過程,以及如何將客戶期望轉化為具體的技術目標。從那裡,你將深入了解安全模型,這將使你能夠建立安全和法律的防護措施,以及從最終用戶的角度看待安全的不同方面。你將學習哪些組織角色需要加入數據湖,他們的責任、所需訪問的服務,以及升級層級應該如何運作。隨後的章節探討了如何將數據湖劃分為區域、組織數據以確保安全和訪問、管理數據敏感性,以及用於數據混淆的技術。數據湖中的審計和日誌功能也會被涵蓋,然後深入設計數據湖以處理多種類型和文件格式及訪問模式。本書最後專注於生產運營化和實施生產設置的解決方案。

完成本書後,你將了解如何實施數據湖、在此過程中應採用的最佳實踐,並掌握解決商業問題的實用技巧。

你將學到的內容:
- 了解實施數據湖所面臨的挑戰
- 探索用於設計新數據湖的架構模式和流程
- 設計和實施數據湖功能
- 將商業需求與技術交付物關聯以推動成功

本書適合對象:
數據科學家和架構師、機器學習工程師以及軟體工程師。

作者簡介

Nayanjyoti Paul is an Associate Director and Chief Azure Architect for GenAI and LLM CoE for Accenture. He is the product owner and creator of a patented asset. Presently, he leads multiple projects as a lead architect around generative AI, large language models, data analytics, and machine learning. Nayan is a certified Master Technology Architect, certified Data Scientist, and certified Databricks Champion with additional AWS and Azure certifications. He is a speaker at conferences like Strata Conference, Data Works Summit, and AWS Reinvent. He also delivers guest lectures at Universities.

作者簡介(中文翻譯)

Nayanjyoti Paul 是 Accenture 的副總監及 Azure 首席架構師,負責 GenAI 和 LLM 中心的工作。他是某項專利資產的產品負責人和創造者。目前,他作為首席架構師領導多個專案,專注於生成式 AI、大型語言模型、數據分析和機器學習。Nayan 是認證的技術架構大師、認證數據科學家,以及認證的 Databricks Champion,並擁有額外的 AWS 和 Azure 認證。他在 Strata Conference、Data Works Summit 和 AWS Reinvent 等會議上擔任演講者,並在大學進行客座講座。