Laboratory Experiments in Information Retrieval: Sample Sizes, Effect Sizes, and Statistical Power (The Information Retrieval Series)
暫譯: 資訊檢索實驗室實驗:樣本大小、效應大小與統計檢定力(資訊檢索系列)

Tetsuya Sakai

  • 出版商: Springer
  • 出版日期: 2018-10-04
  • 售價: $2,420
  • 貴賓價: 9.5$2,299
  • 語言: 英文
  • 頁數: 150
  • 裝訂: Hardcover
  • ISBN: 9811311986
  • ISBN-13: 9789811311987
  • 海外代購書籍(需單獨結帳)

相關主題

商品描述

Covering aspects from principles and limitations of statistical significance tests to topic set size design and power analysis, this book guides readers to statistically well-designed experiments. Although classical statistical significance tests are to some extent useful in information retrieval (IR) evaluation, they can harm research unless they are used appropriately with the right sample sizes and statistical power and unless the test results are reported properly. The first half of the book is mainly targeted at undergraduate students, and the second half is suitable for graduate students and researchers who regularly conduct laboratory experiments in IR, natural language processing, recommendations, and related fields.

Chapters 1–5 review parametric significance tests for comparing system means, namely, t-tests and ANOVAs, and show how easily they can be conducted using Microsoft Excel or R. These chapters also discuss a few multiple comparison procedures for researchers who are interested in comparing every system pair, including a randomised version of Tukey's Honestly Significant Difference test. The chapters then deal with known limitations of classical significance testing and provide practical guidelines for reporting research results regarding comparison of means.

Chapters 6 and 7 discuss statistical power. Chapter 6 introduces topic set size design to enable test collection builders to determine an appropriate number of topics to create. Readers can easily use the author’s Excel tools for topic set size design based on the paired and two-sample t-tests, one-way ANOVA, and confidence intervals. Chapter 7 describes power-analysis-based methods for determining an appropriate sample size for a new experiment based on a similar experiment done in the past, detailing how to utilize the author’s R tools for power analysis and how to interpret the results. Case studies from IR for both Excel-based topic set size design and R-based power analysis are also provided.

商品描述(中文翻譯)

本書涵蓋了從統計顯著性檢定的原則和限制到主題集大小設計和檢定力分析的各個方面,指導讀者進行統計設計良好的實驗。雖然經典的統計顯著性檢定在資訊檢索(IR)評估中在某種程度上是有用的,但如果不適當使用正確的樣本大小和統計檢定力,並且未正確報告檢定結果,則可能會對研究造成傷害。本書的前半部分主要針對本科生,而後半部分則適合定期在IR、自然語言處理、推薦系統及相關領域進行實驗的研究生和研究人員。

第1至第5章回顧了用於比較系統均值的參數顯著性檢定,即t-檢定和ANOVA,並展示了如何輕鬆地使用Microsoft Excel或R進行這些檢定。這些章節還討論了一些多重比較程序,供有興趣比較每個系統對的研究人員使用,包括Tukey的誠實顯著差異檢定的隨機化版本。接下來的章節處理了經典顯著性檢定的已知限制,並提供了有關均值比較的研究結果報告的實用指導。

第6和第7章討論了統計檢定力。第6章介紹了主題集大小設計,以使檢測集合建構者能夠確定適當的主題數量。讀者可以輕鬆使用作者的Excel工具,根據配對和兩樣本t-檢定、單因子ANOVA和信賴區間進行主題集大小設計。第7章描述了基於檢定力分析的方法,用於根據過去進行的類似實驗確定新實驗的適當樣本大小,詳細說明了如何利用作者的R工具進行檢定力分析以及如何解釋結果。還提供了來自IR的案例研究,涵蓋基於Excel的主題集大小設計和基於R的檢定力分析。