Data-Intensive Text Processing with MapReduce (Paperback)
暫譯: 使用 MapReduce 進行數據密集型文本處理 (平裝本)

Jimmy Lin, Chris Dyer

  • 出版商: Morgan & Claypool
  • 出版日期: 2010-04-30
  • 售價: $1,600
  • 貴賓價: 9.5$1,520
  • 語言: 英文
  • 頁數: 178
  • 裝訂: Paperback
  • ISBN: 1608453421
  • ISBN-13: 9781608453429
  • 相關分類: 分散式架構
  • 海外代購書籍(需單獨結帳)

買這商品的人也買了...

相關主題

商品描述

Our world is being revolutionized by data-driven methods: access to large amounts of data has generated new insights and opened exciting new opportunities in commerce, science, and computing applications. Processing the enormous quantities of data necessary for these advances requires large clusters, making distributed computing paradigms more crucial than ever. MapReduce is a programming model for expressing distributed computations on massive datasets and an execution framework for large-scale data processing on clusters of commodity servers. The programming model provides an easy-to-understand abstraction for designing scalable algorithms, while the execution framework transparently handles many system-level details, ranging from scheduling to synchronization to fault tolerance. This book focuses on MapReduce algorithm design, with an emphasis on text processing algorithms common in natural language processing, information retrieval, and machine learning. We introduce the notion of MapReduce design patterns, which represent general reusable solutions to commonly occurring problems across a variety of problem domains. This book not only intends to help the reader "think in MapReduce", but also discusses limitations of the programming model as well. Table of Contents: Introduction / MapReduce Basics / MapReduce Algorithm Design / Inverted Indexing for Text Retrieval / Graph Algorithms / EM Algorithms for Text Processing / Closing Remarks

商品描述(中文翻譯)

我們的世界正被數據驅動的方法所革命化:對大量數據的訪問產生了新的洞察,並在商業、科學和計算應用中開啟了令人興奮的新機會。處理這些進步所需的龐大數據量需要大型集群,使得分散式計算範式比以往任何時候都更加重要。MapReduce 是一種用於表達在大規模數據集上進行分散式計算的編程模型,以及一個在商用伺服器集群上進行大規模數據處理的執行框架。該編程模型提供了一個易於理解的抽象,用於設計可擴展的算法,而執行框架則透明地處理許多系統級的細節,從排程到同步再到容錯。這本書專注於 MapReduce 算法設計,特別強調在自然語言處理、信息檢索和機器學習中常見的文本處理算法。我們介紹了 MapReduce 設計模式的概念,這些模式代表了在各種問題領域中常見問題的通用可重用解決方案。本書不僅旨在幫助讀者「以 MapReduce 思考」,還討論了該編程模型的局限性。

目錄:引言 / MapReduce 基礎 / MapReduce 算法設計 / 文本檢索的反向索引 / 圖算法 / 文本處理的 EM 算法 / 結語