Practical Text Mining with Perl (Hardcover)
暫譯: 實用的 Perl 文本挖掘 (精裝版)

Roger Bilisoly

  • 出版商: Wiley
  • 出版日期: 2008-08-01
  • 定價: $3,670
  • 售價: 9.5$3,487
  • 語言: 英文
  • 頁數: 320
  • 裝訂: Hardcover
  • ISBN: 0470176431
  • ISBN-13: 9780470176436
  • 相關分類: Perl 程式語言Text-mining
  • 立即出貨 (庫存=1)

買這商品的人也買了...

商品描述

Provides readers with the methods, algorithms, and means to perform text mining tasks

This book is devoted to the fundamentals of text mining using Perl, an open-source programming tool that is freely available via the Internet (www.perl.org). It covers mining ideas from several perspectives--statistics, data mining, linguistics, and information retrieval--and provides readers with the means to successfully complete text mining tasks on their own.

The book begins with an introduction to regular expressions, a text pattern methodology, and quantitative text summaries, all of which are fundamental tools of analyzing text. Then, it builds upon this foundation to explore:

  • Probability and texts, including the bag-of-words model
  • Information retrieval techniques such as the TF-IDF similarity measure
  • Concordance lines and corpus linguistics
  • Multivariate techniques such as correlation, principal components analysis, and clustering
  • Perl modules, German, and permutation tests

Each chapter is devoted to a single key topic, and the author carefully and thoughtfully introduces mathematical concepts as they arise, allowing readers to learn as they go without having to refer to additional books. The inclusion of numerous exercises and worked-out examples further complements the book's student-friendly format.

Practical Text Mining with Perl is ideal as a textbook for undergraduate and graduate courses in text mining and as a reference for a variety of professionals who are interested in extracting information from text documents.

商品描述(中文翻譯)

提供讀者進行文本挖掘任務的方法、算法和手段

本書專注於使用 Perl 進行文本挖掘的基本原理,Perl 是一種開源的程式設計工具,透過互聯網(www.perl.org)免費提供。它從多個角度探討挖掘的概念,包括統計學、數據挖掘、語言學和信息檢索,並為讀者提供成功獨立完成文本挖掘任務的手段。

本書首先介紹正則表達式、文本模式方法和定量文本摘要,這些都是分析文本的基本工具。然後,基於這些基礎,探索以下主題:


  • 概率與文本,包括詞袋模型

  • 信息檢索技術,如 TF-IDF 相似度度量

  • 語料庫語言學和一致性行

  • 多變量技術,如相關性、主成分分析和聚類

  • Perl 模組、德國和置換檢驗


每一章都專注於一個關鍵主題,作者在出現數學概念時仔細且深思熟慮地介紹,讓讀者在學習過程中無需參考其他書籍。書中包含大量練習和詳細示例,進一步補充了本書對學生友好的格式。

使用 Perl 的實用文本挖掘 是本科和研究生文本挖掘課程的理想教材,也是對於希望從文本文件中提取信息的各類專業人士的參考書。