Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning
暫譯: 使用 Python 進行應用文本分析:利用機器學習實現語言感知數據產品

Benjamin Bengfort, Rebecca Bilbro, Tony Ojeda

買這商品的人也買了...

商品描述

From news and speeches to informal chatter on social media, natural language is one of the richest and most underutilized sources of data. Not only does it come in a constant stream, always changing and adapting in context; it also contains information that is not conveyed by traditional data sources. The key to unlocking natural language is through the creative application of text analytics. This practical book presents a data scientist’s approach to building language-aware products with applied machine learning.

You’ll learn robust, repeatable, and scalable techniques for text analysis with Python, including contextual and linguistic feature engineering, vectorization, classification, topic modeling, entity resolution, graph analysis, and visual steering. By the end of the book, you’ll be equipped with practical methods to solve any number of complex real-world problems.

  • Preprocess and vectorize text into high-dimensional feature representations
  • Perform document classification and topic modeling
  • Steer the model selection process with visual diagnostics
  • Extract key phrases, named entities, and graph structures to reason about data in text
  • Build a dialog framework to enable chatbots and language-driven interaction
  • Use Spark to scale processing power and neural networks to scale model complexity

商品描述(中文翻譯)

從新聞和演講到社交媒體上的非正式聊天,自然語言是最豐富且最未被充分利用的數據來源之一。它不僅以不斷變化和適應上下文的方式持續流動,還包含傳統數據來源所無法傳達的信息。解鎖自然語言的關鍵在於創造性地應用文本分析。本書以數據科學家的視角,介紹如何利用應用機器學習來構建具語言感知的產品。

您將學習使用 Python 進行文本分析的穩健、可重複和可擴展的技術,包括上下文和語言特徵工程、向量化、分類、主題建模、實體解析、圖形分析和視覺引導。到本書結束時,您將掌握實用的方法來解決各種複雜的現實世界問題。

- 對文本進行預處理並向量化為高維特徵表示
- 執行文檔分類和主題建模
- 通過視覺診斷引導模型選擇過程
- 提取關鍵短語、命名實體和圖形結構,以推理文本中的數據
- 構建對話框架以啟用聊天機器人和基於語言的互動
- 使用 Spark 擴展處理能力,並使用神經網絡擴展模型複雜性