Data Management in Machine Learning Systems
暫譯: 機器學習系統中的數據管理

Boehm, Matthias, Kumar, Arun, Yang, Jun

  • 出版商: Morgan & Claypool
  • 出版日期: 2019-02-25
  • 售價: $2,410
  • 貴賓價: 9.5$2,290
  • 語言: 英文
  • 頁數: 173
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1681734966
  • ISBN-13: 9781681734965
  • 相關分類: Machine Learning
  • 海外代購書籍(需單獨結帳)

商品描述

Large-scale data analytics using machine learning (ML) underpins many modern data-driven applications. ML systems provide means of specifying and executing these ML workloads in an efficient and scalable manner. Data management is at the heart of many ML systems due to data-driven application characteristics, data-centric workload characteristics, and system architectures inspired by classical data management techniques.

In this book, we follow this data-centric view of ML systems and aim to provide a comprehensive overview of data management in ML systems for the end-to-end data science or ML lifecycle. We review multiple interconnected lines of work: (1) ML support in database (DB) systems, (2) DB-inspired ML systems, and (3) ML lifecycle systems. Covered topics include: in-database analytics via query generation and user-defined functions, factorized and statistical-relational learning; optimizing compilers for ML workloads; execution strategies and hardware accelerators; data access methods such as compression, partitioning and indexing; resource elasticity and cloud markets; as well as systems for data preparation for ML, model selection, model management, model debugging, and model serving. Given the rapidly evolving field, we strive for a balance between an up-to-date survey of ML systems, an overview of the underlying concepts and techniques, as well as pointers to open research questions. Hence, this book might serve as a starting point for both systems researchers and developers.

商品描述(中文翻譯)

大型數據分析使用機器學習(ML)支撐著許多現代數據驅動的應用程式。ML 系統提供了以高效且可擴展的方式指定和執行這些 ML 工作負載的手段。由於數據驅動應用的特性、以數據為中心的工作負載特性以及受經典數據管理技術啟發的系統架構,數據管理成為許多 ML 系統的核心。

在本書中,我們遵循這種以數據為中心的 ML 系統觀點,旨在提供 ML 系統中數據管理的全面概述,以涵蓋端到端的數據科學或 ML 生命週期。我們回顧了多條相互關聯的研究線索:(1)數據庫(DB)系統中的 ML 支持,(2)受數據庫啟發的 ML 系統,以及(3)ML 生命週期系統。涵蓋的主題包括:通過查詢生成和用戶定義函數進行的內部數據庫分析、因子化和統計關聯學習;針對 ML 工作負載的優化編譯器;執行策略和硬體加速器;數據訪問方法,如壓縮、分區和索引;資源彈性和雲市場;以及用於 ML 的數據準備、模型選擇、模型管理、模型調試和模型服務的系統。鑒於這一快速發展的領域,我們努力在對 ML 系統的最新調查、基礎概念和技術的概述以及開放研究問題的指引之間取得平衡。因此,本書可能成為系統研究者和開發者的起點。