NLTK Essentials
暫譯: NLTK 基礎知識

Nitin Hardeniya

商品描述

Build cool NLP and machine learning applications using NLTK and other Python libraries

About This Book

  • Extract information from unstructured data using NLTK to solve NLP problems
  • Analyse linguistic structures in text and learn the concept of semantic analysis and parsing
  • Learn text analysis, text mining, and web crawling in a simplified manner

Who This Book Is For

If you are an NLP or machine learning enthusiast with some or no experience in text processing, then this book is for you. This book is also ideal for expert Python programmers who want to learn NLTK quickly.

What You Will Learn

  • Get a glimpse of the complexity of natural languages and how they are processed by machines
  • Clean and wrangle text using tokenization and chunking to help you better process data
  • Explore the different types of tags available and learn how to tag sentences
  • Create a customized parser and tokenizer to suit your needs
  • Build a real-life application with features such as spell correction, search, machine translation and a question answering system
  • Retrieve any data content using crawling and scraping
  • Perform feature extraction and selection, and build a classification system on different pieces of texts
  • Use various other Python libraries such as pandas, scikit-learn, matplotlib, and gensim
  • Analyse social media sites to discover trending topics and perform sentiment analysis

In Detail

Natural Language Processing (NLP) is the field of artificial intelligence and computational linguistics that deals with the interactions between computers and human languages. With the instances of human-computer interaction increasing, it's becoming imperative for computers to comprehend all major natural languages. Natural Language Toolkit (NLTK) is one such powerful and robust tool.

You start with an introduction to get the gist of how to build systems around NLP. We then move on to explore data science-related tasks, following which you will learn how to create a customized tokenizer and parser from scratch. Throughout, we delve into the essential concepts of NLP while gaining practical insights into various open source tools and libraries available in Python for NLP. You will then learn how to analyze social media sites to discover trending topics and perform sentiment analysis. Finally, you will see tools which will help you deal with large scale text.

By the end of this book, you will be confident about NLP and data science concepts and know how to apply them in your day-to-day work.

商品描述(中文翻譯)

**使用 NLTK 和其他 Python 函式庫構建酷炫的 NLP 和機器學習應用程式**

## 本書介紹
- 使用 NLTK 從非結構化數據中提取信息,以解決 NLP 問題
- 分析文本中的語言結構,學習語義分析和解析的概念
- 以簡化的方式學習文本分析、文本挖掘和網頁爬蟲

## 本書適合誰
如果您是對 NLP 或機器學習感興趣的初學者,無論有無文本處理經驗,本書都適合您。本書也非常適合希望快速學習 NLTK 的專業 Python 程式設計師。

## 您將學到什麼
- 瞭解自然語言的複雜性以及計算機如何處理它們
- 使用標記化和分塊技術清理和整理文本,以幫助您更好地處理數據
- 探索可用的不同類型標籤,並學習如何標記句子
- 創建自定義的解析器和標記器以滿足您的需求
- 構建一個具有拼寫校正、搜索、機器翻譯和問答系統等功能的實際應用程式
- 使用爬蟲和抓取技術檢索任何數據內容
- 執行特徵提取和選擇,並在不同文本上構建分類系統
- 使用其他各種 Python 函式庫,如 pandas、scikit-learn、matplotlib 和 gensim
- 分析社交媒體網站以發現熱門話題並執行情感分析

## 詳細內容
自然語言處理 (NLP) 是人工智慧和計算語言學的領域,處理計算機與人類語言之間的互動。隨著人機互動的實例不斷增加,計算機理解所有主要自然語言變得越來越重要。自然語言工具包 (NLTK) 是這樣一個強大而穩健的工具。

您將從介紹開始,了解如何圍繞 NLP 構建系統的要點。接著,我們將探索與數據科學相關的任務,然後您將學習如何從零開始創建自定義的標記器和解析器。在整個過程中,我們深入探討 NLP 的基本概念,同時獲得有關 Python 中可用的各種開源工具和函式庫的實用見解。然後,您將學習如何分析社交媒體網站以發現熱門話題並執行情感分析。最後,您將看到幫助您處理大規模文本的工具。

在本書結束時,您將對 NLP 和數據科學的概念充滿信心,並知道如何在日常工作中應用它們。