Feature Selection and Enhanced Krill Herd Algorithm for Text Document Clustering
暫譯: 特徵選擇與增強型克里爾群算法在文本文件聚類中的應用
Abualigah, Laith Mohammad Qasim
- 出版商: Springer
- 出版日期: 2019-01-03
- 售價: $4,510
- 貴賓價: 9.5 折 $4,285
- 語言: 英文
- 頁數: 165
- 裝訂: Hardcover - also called cloth, retail trade, or trade
- ISBN: 303010673X
- ISBN-13: 9783030106737
-
相關分類:
Algorithms-data-structures
海外代購書籍(需單獨結帳)
相關主題
商品描述
This book puts forward a new method for solving the text document (TD) clustering problem, which is established in two main stages: (i) A new feature selection method based on a particle swarm optimization algorithm with a novel weighting scheme is proposed, as well as a detailed dimension reduction technique, in order to obtain a new subset of more informative features with low-dimensional space. This new subset is subsequently used to improve the performance of the text clustering (TC) algorithm and reduce its computation time. The k-mean clustering algorithm is used to evaluate the effectiveness of the obtained subsets. (ii) Four krill herd algorithms (KHAs), namely, the (a) basic KHA, (b) modified KHA, (c) hybrid KHA, and (d) multi-objective hybrid KHA, are proposed to solve the TC problem; each algorithm represents an incremental improvement on its predecessor. For the evaluation process, seven benchmark text datasets are used with different characterizations and complexities.
Text document (TD) clustering is a new trend in text mining in which the TDs are separated into several coherent clusters, where all documents in the same cluster are similar. The findings presented here confirm that the proposed methods and algorithms delivered the best results in comparison with other, similar methods to be found in the literature.
商品描述(中文翻譯)
這本書提出了一種解決文本文件(TD)聚類問題的新方法,該方法主要分為兩個階段:(i)提出了一種基於粒子群優化算法的新特徵選擇方法,並採用新穎的加權方案,以及詳細的降維技術,以獲得一個新的低維空間中更具信息量的特徵子集。這個新的子集隨後用於提高文本聚類(TC)算法的性能並減少其計算時間。使用 k-均值聚類算法來評估所獲得子集的有效性。(ii)提出了四種克里爾群算法(KHA),即(a)基本 KHA,(b)修改 KHA,(c)混合 KHA,以及(d)多目標混合 KHA,以解決 TC 問題;每種算法都是對其前身的增量改進。在評估過程中,使用了七個具有不同特徵和複雜性的基準文本數據集。
文本文件(TD)聚類是文本挖掘中的一個新趨勢,其中 TD 被分成幾個連貫的聚類,所有在同一聚類中的文件都是相似的。這裡呈現的研究結果確認了所提出的方法和算法在與文獻中其他類似方法的比較中提供了最佳結果。