Positive Unlabeled Learning
暫譯: 正標籤與未標籤學習

Jaskie, Kristen, Spanias, Andreas

  • 出版商: Morgan & Claypool
  • 出版日期: 2022-04-20
  • 售價: $2,250
  • 貴賓價: 9.5$2,138
  • 語言: 英文
  • 頁數: 154
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 163639308X
  • ISBN-13: 9781636393087
  • 海外代購書籍(需單獨結帳)

商品描述

Machine learning and artificial intelligence (AI) are powerful tools that create predictive models, extract information, and help make complex decisions. They do this by examining an enormous quantity of labeled training data to find patterns too complex for human observation. However, in many real-world applications, well-labeled data can be difficult, expensive, or even impossible to obtain. In some cases, such as when identifying rare objects like new archeological sites or secret enemy military facilities in satellite images, acquiring labels could require months of trained human observers at incredible expense. Other times, as when attempting to predict disease infection during a pandemic such as COVID-19, reliable true labels may be nearly impossible to obtain early on due to lack of testing equipment or other factors. In that scenario, identifying even a small amount of truly negative data may be impossible due to the high false negative rate of available tests. In such problems, it is possible to label a small subset of data as belonging to the class of interest though it is impractical to manually label all data not of interest. We are left with a small set of positive labeled data and a large set of unknown and unlabeled data.

Readers will explore this Positive and Unlabeled learning (PU learning) problem in depth. The book rigorously defines the PU learning problem, discusses several common assumptions that are frequently made about the problem and their implications, and considers how to evaluate solutions for this problem before describing several of the most popular algorithms to solve this problem. It explores several uses for PU learning including applications in biological/medical, business, security, and signal processing. This book also provides high-level summaries of several related learning problems such as one-class classification, anomaly detection, and noisy learning and their relation to PU learning.

商品描述(中文翻譯)

機器學習和人工智慧 (AI) 是強大的工具,能夠創建預測模型、提取資訊並幫助做出複雜的決策。它們通過檢查大量標記的訓練數據來尋找人類觀察無法察覺的複雜模式。然而,在許多現實世界的應用中,獲得良好標記的數據可能是困難的、昂貴的,甚至是不可能的。在某些情況下,例如在衛星圖像中識別稀有物體,如新的考古遺址或秘密敵軍軍事設施,獲取標籤可能需要數月的訓練人類觀察者,並且成本驚人。其他情況下,例如在 COVID-19 疫情期間預測疾病感染時,由於缺乏測試設備或其他因素,可靠的真實標籤在早期幾乎不可能獲得。在這種情況下,由於可用測試的高假陰性率,識別即使是少量真正的負數據也可能是不可能的。在這類問題中,雖然手動標記所有不感興趣的數據不切實際,但可以將一小部分數據標記為屬於感興趣的類別。我們剩下的是一小組正標記數據和一大組未知且未標記的數據。

讀者將深入探討這個正標記與未標記學習 (PU learning) 問題。本書嚴格定義了 PU learning 問題,討論了對該問題經常做出的幾個常見假設及其影響,並考慮如何評估該問題的解決方案,然後描述幾種最受歡迎的算法來解決這個問題。本書探討了 PU learning 的幾種用途,包括在生物/醫療、商業、安全和信號處理中的應用。本書還提供了幾個相關學習問題的高層次摘要,如單類別分類、異常檢測和噪聲學習,以及它們與 PU learning 的關係。