The Handbook of NLP with Gensim: Leverage topic modeling to uncover hidden patterns, themes, and valuable insights within textual data
暫譯: Gensim自然語言處理手冊:利用主題建模揭示文本數據中的隱藏模式、主題和有價值的見解

Kuo, Chris

  • 出版商: Packt Publishing
  • 出版日期: 2023-10-27
  • 售價: $1,800
  • 貴賓價: 9.5$1,710
  • 語言: 英文
  • 頁數: 310
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1803244941
  • ISBN-13: 9781803244945
  • 相關分類: Text-mining
  • 立即出貨 (庫存=1)

相關主題

商品描述

Navigating the terrain of NLP research and applying it practically can be a formidable task made easy with The Handbook of NLP with Gensim. This book demystifies NLP and equips you with hands-on strategies spanning healthcare, e-commerce, finance, and more to enable you to leverage Gensim in real-world scenarios. You'll begin by exploring motives and techniques for extracting text information like bag-of-words, TF-IDF, and word embeddings. This book will then guide you on topic modeling using methods such as Latent Semantic Analysis (LSA) for dimensionality reduction and discovering latent semantic relationships in text data, Latent Dirichlet Allocation (LDA) for probabilistic topic modeling, and Ensemble LDA to enhance topic modeling stability and accuracy. Next, you'll learn text summarization techniques with Word2Vec and Doc2Vec to build the modeling pipeline and optimize models using hyperparameters. As you get acquainted with practical applications in various industries, this book will inspire you to design innovative projects. Alongside topic modeling, you'll also explore named entity handling and NER tools, modeling procedures, and tools for effective topic modeling applications. By the end of this book, you'll have mastered the techniques essential to create applications with Gensim and integrate NLP into your business processes.

商品描述(中文翻譯)

在自然語言處理(NLP)研究的領域中導航並將其實際應用可能是一項艱鉅的任務,但《使用 Gensim 的 NLP 手冊》使這一過程變得簡單。本書揭開了 NLP 的神秘面紗,並為您提供了涵蓋醫療保健、電子商務、金融等領域的實用策略,使您能夠在現實場景中利用 Gensim。您將首先探索提取文本信息的動機和技術,如詞袋模型(bag-of-words)、TF-IDF 和詞嵌入(word embeddings)。接著,本書將指導您使用主題建模方法,如潛在語義分析(Latent Semantic Analysis, LSA)進行降維和發現文本數據中的潛在語義關係,潛在狄利克雷分配(Latent Dirichlet Allocation, LDA)進行概率主題建模,以及集成 LDA 以增強主題建模的穩定性和準確性。接下來,您將學習使用 Word2Vec 和 Doc2Vec 的文本摘要技術,以建立建模管道並使用超參數優化模型。隨著您熟悉各行各業的實際應用,本書將激勵您設計創新的項目。除了主題建模,您還將探索命名實體處理和命名實體識別(NER)工具、建模程序以及有效主題建模應用的工具。到本書結束時,您將掌握創建 Gensim 應用程序和將 NLP 整合到業務流程中的必要技術。