Mining Structures of Factual Knowledge from Text: An Effort-Light Approach
暫譯: 從文本中挖掘事實知識結構:輕鬆的方法
Ren, Xiang, Han, Jiawei, Han, Jiawei
- 出版商: Morgan & Claypool
- 出版日期: 2018-06-27
- 售價: $3,530
- 貴賓價: 9.5 折 $3,354
- 語言: 英文
- 頁數: 199
- 裝訂: Hardcover - also called cloth, retail trade, or trade
- ISBN: 1681733943
- ISBN-13: 9781681733944
海外代購書籍(需單獨結帳)
商品描述
The real-world data, though massive, is largely unstructured, in the form of natural-language text. It is challenging but highly desirable to mine structures from massive text data, without extensive human annotation and labeling. In this book, we investigate the principles and methodologies of mining structures of factual knowledge (e.g., entities and their relationships) from massive, unstructured text corpora.
Departing from many existing structure extraction methods that have heavy reliance on human annotated data for model training, our effort-light approach leverages human-curated facts stored in external knowledge bases as distant supervision and exploits rich data redundancy in large text corpora for context understanding. This effort-light mining approach leads to a series of new principles and powerful methodologies for structuring text corpora, including (1) entity recognition, typing and synonym discovery, (2) entity relation extraction, and (3) open-domain attribute-value mining and information extraction. This book introduces this new research frontier and points out some promising research directions.
商品描述(中文翻譯)
實際世界中的數據雖然龐大,但大多數是非結構化的,呈現為自然語言文本。從龐大的文本數據中挖掘結構是具有挑戰性的,但也是非常渴望的,尤其是在沒有大量人為標註和標籤的情況下。在本書中,我們探討從龐大的非結構化文本語料庫中挖掘事實知識結構(例如,實體及其關係)的原則和方法論。
與許多現有的結構提取方法依賴於人為標註數據進行模型訓練不同,我們的輕量化方法利用存儲在外部知識庫中的人為策劃事實作為遠程監督,並利用大型文本語料庫中的豐富數據冗餘來理解上下文。這種輕量化的挖掘方法導致了一系列新的原則和強大的方法論,用於結構化文本語料庫,包括(1)實體識別、類型和同義詞發現,(2)實體關係提取,以及(3)開放領域屬性-值挖掘和信息提取。本書介紹了這一新的研究前沿,並指出了一些有前景的研究方向。