Supervised Machine Learning for Text Analysis in R
暫譯: R語言文本分析的監督式機器學習

Name: Supervised Machine Learning for Text Analysis in R
Price: 2161 TWD
Availability: InStock
Author: Hvitfeldt, Emil, Silge, Julia
ISBN: 0367554194

Hvitfeldt, Emil, Silge, Julia

出版商: CRC
出版日期: 2021-10-22
售價: $2,275
貴賓價: 9.5 折 $2,161
語言: 英文
頁數: 392
裝訂: Quality Paper - also called trade paper
ISBN: 0367554194
ISBN-13: 9780367554194
相關分類: Machine Learning
其他版本: Supervised Machine Learning for Text Analysis in R

立即出貨 (庫存=1)

商品描述

Text data is important for many domains, from healthcare to marketing to the digital humanities, but specialized approaches are necessary to create features for machine learning from language. Supervised Machine Learning for Text Analysis in R explains how to preprocess text data for modeling, train models, and evaluate model performance using tools from the tidyverse and tidymodels ecosystem. Models like these can be used to make predictions for new observations, to understand what natural language features or characteristics contribute to differences in the output, and more. If you are already familiar with the basics of predictive modeling, use the comprehensive, detailed examples in this book to extend your skills to the domain of natural language processing.

This book provides practical guidance and directly applicable knowledge for data scientists and analysts who want to integrate unstructured text data into their modeling pipelines. Learn how to use text data for both regression and classification tasks, and how to apply more straightforward algorithms like regularized regression or support vector machines as well as deep learning approaches. Natural language must be dramatically transformed to be ready for computation, so we explore typical text preprocessing and feature engineering steps like tokenization and word embeddings from the ground up. These steps influence model results in ways we can measure, both in terms of model metrics and other tangible consequences such as how fair or appropriate model results are.

商品描述(中文翻譯)

文本數據在許多領域中都非常重要，從醫療保健到行銷再到數位人文，但需要專門的方法來從語言中創建機器學習的特徵。《使用 R 進行文本分析的監督式機器學習》解釋了如何對文本數據進行預處理以進行建模、訓練模型以及使用 tidyverse 和 tidymodels 生態系統中的工具來評估模型性能。這些模型可以用來對新觀察進行預測，了解自然語言的特徵或特性如何影響輸出之間的差異，等等。如果您已經熟悉預測建模的基本概念，可以利用本書中全面且詳細的範例，將您的技能擴展到自然語言處理的領域。

本書為希望將非結構化文本數據整合到建模流程中的數據科學家和分析師提供了實用的指導和可直接應用的知識。學習如何將文本數據用於回歸和分類任務，以及如何應用更簡單的算法，如正則化回歸或支持向量機，以及深度學習方法。自然語言必須經過徹底轉換才能準備好進行計算，因此我們從基礎開始探討典型的文本預處理和特徵工程步驟，如分詞和詞嵌入。這些步驟以我們可以衡量的方式影響模型結果，包括模型指標和其他具體後果，例如模型結果的公平性或適當性。

作者簡介

Emil Hvitfeldt is a clinical data analyst working in healthcare, and an adjunct professor at American University where he is teaching statistical machine learning with tidymodels. He is also an open source R developer and author of the textrecipes package.

Julia Silge is a data scientist and software engineer at RStudio PBC where she works on open source modeling tools. She is an author, an international keynote speaker and educator, and a real-world practitioner focusing on data analysis and machine learning practice.