Astronomy and Big Data: A Data Clustering Approach to Identifying Uncertain Galaxy Morphology (Studies in Big Data)
暫譯: 天文學與大數據:基於數據聚類的星系形態不確定性識別方法(大數據研究)

Kieran Jay Edwards, Mohamed Medhat Gaber

  • 出版商: Springer
  • 出版日期: 2014-04-29
  • 售價: $4,510
  • 貴賓價: 9.5$4,285
  • 語言: 英文
  • 頁數: 105
  • 裝訂: Hardcover
  • ISBN: 331906598X
  • ISBN-13: 9783319065984
  • 相關分類: 大數據 Big-data
  • 海外代購書籍(需單獨結帳)

相關主題

商品描述

With the onset of massive cosmological data collection through media such as the Sloan Digital Sky Survey (SDSS), galaxy classification has been accomplished for the most part with the help of citizen science communities like Galaxy Zoo. Seeking the wisdom of the crowd for such Big Data processing has proved extremely beneficial. However, an analysis of one of the Galaxy Zoo morphological classification data sets has shown that a significant majority of all classified galaxies are labelled as “Uncertain”.

This book reports on how to use data mining, more specifically clustering, to identify galaxies that the public has shown some degree of uncertainty for as to whether they belong to one morphology type or another. The book shows the importance of transitions between different data mining techniques in an insightful workflow. It demonstrates that Clustering enables to identify discriminating features in the analysed data sets, adopting a novel feature selection algorithms called Incremental Feature Selection (IFS). The book shows the use of state-of-the-art classification techniques, Random Forests and Support Vector Machines to validate the acquired results. It is concluded that a vast majority of these galaxies are, in fact, of spiral morphology with a small subset potentially consisting of stars, elliptical galaxies or galaxies of other morphological variants.

商品描述(中文翻譯)

隨著透過如斯隆數位天空調查(Sloan Digital Sky Survey, SDSS)等媒介進行的大規模宇宙學數據收集,星系分類在很大程度上得益於像Galaxy Zoo這樣的公民科學社群。尋求群眾智慧來處理這類大數據已被證明是極其有益的。然而,對Galaxy Zoo形態分類數據集的分析顯示,絕大多數被分類的星系標記為「不確定」。

本書報告了如何使用數據挖掘,特別是聚類,來識別公眾對於某些星系是否屬於某一形態類型存在一定程度不確定性的情況。本書展示了在一個深刻的工作流程中,不同數據挖掘技術之間轉換的重要性。它證明了聚類能夠識別分析數據集中具有區分特徵的特徵,並採用了名為增量特徵選擇(Incremental Feature Selection, IFS)的新型特徵選擇算法。本書展示了使用最先進的分類技術,如隨機森林(Random Forests)和支持向量機(Support Vector Machines)來驗證所獲得的結果。結論是,這些星系中絕大多數實際上是螺旋形態,只有一小部分可能由恆星、橢圓星系或其他形態變體的星系組成。