Statistical Significance Testing for Natural Language Processing
暫譯: 自然語言處理的統計顯著性檢驗

Name: Statistical Significance Testing for Natural Language Processing
Price: 2451 TWD
Availability: OnlineOnly
Author: Dror, Rotem, Peled-Cohen, Lotem, Shlomov, Segev
ISBN: 1681737973

Dror, Rotem, Peled-Cohen, Lotem, Shlomov, Segev

出版商: Morgan & Claypool
出版日期: 2020-04-03
售價: $2,580
貴賓價: 9.5 折 $2,451
語言: 英文
頁數: 118
裝訂: Hardcover - also called cloth, retail trade, or trade
ISBN: 1681737973
ISBN-13: 9781681737973

海外代購書籍(需單獨結帳)

商品描述

Data-driven experimental analysis has become the main evaluation tool of Natural Language Processing (NLP) algorithms. In fact, in the last decade, it has become rare to see an NLP paper, particularly one that proposes a new algorithm, that does not include extensive experimental analysis, and the number of involved tasks, datasets, domains, and languages is constantly growing. This emphasis on empirical results highlights the role of statistical significance testing in NLP research: If we, as a community, rely on empirical evaluation to validate our hypotheses and reveal the correct language processing mechanisms, we better be sure that our results are not coincidental.

The goal of this book is to discuss the main aspects of statistical significance testing in NLP. Our guiding assumption throughout the book is that the basic question NLP researchers and engineers deal with is whether or not one algorithm can be considered better than another one. This question drives the field forward as it allows the constant progress of developing better technology for language processing challenges. In practice, researchers and engineers would like to draw the right conclusion from a limited set of experiments, and this conclusion should hold for other experiments with datasets they do not have at their disposal or that they cannot perform due to limited time and resources. The book hence discusses the opportunities and challenges in using statistical significance testing in NLP, from the point of view of experimental comparison between two algorithms. We cover topics such as choosing an appropriate significance test for the major NLP tasks, dealing with the unique aspects of significance testing for non-convex deep neural networks, accounting for a large number of comparisons between two NLP algorithms in a statistically valid manner (multiple hypothesis testing), and, finally, the unique challenges yielded by the nature of the data and practices of the field.

商品描述(中文翻譯)

資料驅動的實驗分析已成為自然語言處理（Natural Language Processing, NLP）演算法的主要評估工具。事實上，在過去十年中，幾乎不會看到一篇NLP論文，特別是提出新演算法的論文，而不包含廣泛的實驗分析，涉及的任務、數據集、領域和語言的數量也在不斷增長。這種對實證結果的重視突顯了統計顯著性檢驗在NLP研究中的角色：如果我們作為一個社群，依賴實證評估來驗證我們的假設並揭示正確的語言處理機制，我們必須確保我們的結果不是偶然的。

本書的目標是討論NLP中統計顯著性檢驗的主要方面。我們在整本書中的指導假設是，NLP研究者和工程師所面對的基本問題是，是否可以認為一個演算法優於另一個演算法。這個問題推動了該領域的發展，因為它使得不斷進步的語言處理技術得以發展。在實踐中，研究者和工程師希望從有限的實驗中得出正確的結論，而這個結論應該適用於他們無法使用的數據集或因時間和資源有限而無法進行的其他實驗。因此，本書從兩個演算法之間的實驗比較的角度，討論了在NLP中使用統計顯著性檢驗的機會和挑戰。我們涵蓋的主題包括為主要NLP任務選擇適當的顯著性檢驗、處理非凸深度神經網絡的顯著性檢驗的獨特方面、以統計有效的方式考慮兩個NLP演算法之間的大量比較（多重假設檢驗），以及最後，由於數據的性質和該領域的實踐所帶來的獨特挑戰。