Statistical Methods for Imbalanced Data in Ecological and Biological Studies
暫譯: 生態與生物研究中的不平衡數據統計方法
Komori, Osamu, Eguchi, Shinto
- 出版商: Springer
- 出版日期: 2019-07-15
- 售價: $2,400
- 貴賓價: 9.5 折 $2,280
- 語言: 英文
- 頁數: 92
- 裝訂: Quality Paper - also called trade paper
- ISBN: 4431555692
- ISBN-13: 9784431555698
海外代購書籍(需單獨結帳)
相關主題
商品描述
This book presents a fresh, new approach in that it provides a comprehensive recent review of challenging problems caused by imbalanced data in prediction and classification, and also in that it introduces several of the latest statistical methods of dealing with these problems. The book discusses the property of the imbalance of data from two points of view. The first is quantitative imbalance, meaning that the sample size in one population highly outnumbers that in another population. It includes presence-only data as an extreme case, where the presence of a species is confirmed, whereas the information on its absence is uncertain, which is especially common in ecology in predicting habitat distribution. The second is qualitative imbalance, meaning that the data distribution of one population can be well specified whereas that of the other one shows a highly heterogeneous property. A typical case is the existence of outliers commonly observed in gene expression data, and another is heterogeneous characteristics often observed in a case group in case-control studies. The extension of the logistic regression model, maxent, and AdaBoost for imbalanced data is discussed, providing a new framework for improvement of prediction, classification, and performance of variable selection. Weights functions introduced in the methods play an important role in alleviating the imbalance of data. This book also furnishes a new perspective on these problem and shows some applications of the recently developed statistical methods to real data sets.
商品描述(中文翻譯)
本書提供了一種全新的方法,對於由於不平衡數據所引起的預測和分類中的挑戰性問題進行了全面的近期回顧,並介紹了幾種最新的統計方法來處理這些問題。本書從兩個角度討論數據不平衡的特性。第一個是定量不平衡,意味著一個族群中的樣本數量遠超過另一個族群的樣本數量。這包括僅存在數據(presence-only data)作為極端情況,其中某物種的存在是確定的,而其缺失的信息則是不確定的,這在生態學中預測棲息地分佈時尤其常見。第二個是定性不平衡,意味著一個族群的數據分佈可以很好地被指定,而另一個族群則顯示出高度異質的特性。一個典型的案例是基因表達數據中常見的異常值(outliers),另一個則是在病例對照研究中常見的病例組的異質特徵。本書討論了邏輯回歸模型的擴展、最大熵(maxent)和AdaBoost在不平衡數據中的應用,提供了一個改進預測、分類和變數選擇性能的新框架。方法中引入的權重函數在減輕數據不平衡方面扮演了重要角色。本書還提供了對這些問題的新視角,並展示了最近開發的統計方法在實際數據集中的一些應用。
作者簡介
Osamu Komori, The Institute of Statistical Mathematics,
Shinto Eguchi, The Institute of Statistical Mathematics
Shinto Eguchi, The Institute of Statistical Mathematics
作者簡介(中文翻譯)
小森修,統計數學研究所,
江口新,統計數學研究所