Phrase Mining from Massive Text and Its Applications (Synthesis Lectures on Data Mining and Knowledge Discovery)
暫譯: 從大量文本中挖掘短語及其應用(數據挖掘與知識發現綜合講座)

Jialu Liu, Jingbo Shang, Jiawei Han

  • 出版商: Morgan & Claypool
  • 出版日期: 2017-03-30
  • 售價: $1,610
  • 貴賓價: 9.5$1,530
  • 語言: 英文
  • 頁數: 90
  • 裝訂: Paperback
  • ISBN: 1627058982
  • ISBN-13: 9781627058988
  • 相關分類: Data-mining
  • 海外代購書籍(需單獨結帳)

相關主題

商品描述

A lot of digital ink has been spilled on "big data" over the past few years. Most of this surge owes its origin to the various types of unstructured data in the wild, among which the proliferation of text-heavy data is particularly overwhelming, attributed to the daily use of web documents, business reviews, news, social posts, etc., by so many people worldwide.A core challenge presents itself: How can one efficiently and effectively turn massive, unstructured text into structured representation so as to further lay the foundation for many other downstream text mining applications?

In this book, we investigated one promising paradigm for representing unstructured text, that is, through automatically identifying high-quality phrases from innumerable documents. In contrast to a list of frequent n-grams without proper filtering, users are often more interested in results based on variable-length phrases with certain semantics such as scientific concepts, organizations, slogans, and so on. We propose new principles and powerful methodologies to achieve this goal, from the scenario where a user can provide meaningful guidance to a fully automated setting through distant learning. This book also introduces applications enabled by the mined phrases and points out some promising research directions.

商品描述(中文翻譯)

在過去幾年中,關於「大數據」的討論已經引起了大量的關注。這股熱潮主要源於各種未結構化數據的出現,其中以文本數據的激增尤為顯著,這是因為全球有如此多的人每天使用網頁文件、商業評論、新聞、社交帖子等。這裡出現了一個核心挑戰:如何有效且高效地將大量未結構化文本轉換為結構化表示,以進一步為許多下游文本挖掘應用奠定基礎?

在本書中,我們探討了一種有前景的未結構化文本表示範式,即通過自動識別無數文件中的高質量短語。與未經適當過濾的頻繁 n-gram 列表相比,用戶通常對基於具有特定語義的可變長度短語的結果更感興趣,例如科學概念、組織、標語等。我們提出了新的原則和強大的方法論來實現這一目標,從用戶可以提供有意義的指導的場景到完全自動化的遠程學習設置。本書還介紹了由挖掘的短語所啟用的應用,並指出了一些有前景的研究方向。

最後瀏覽商品 (20)