(福州大學 數學與計算機學院,福建 福州350108)
摘要 半監督聚類就是利用樣本的監督信息來幫助提升無監督學習的性能。樣本的監督信息包括類標記信息和成對約束信息(must-link約束 和 cannot-link約束)。本文提出了一種基于類標記和成對約束的半監督聚類算法(PLG-SSC),該算法結合了遺傳算法的優勢,充分利用了前面兩方面的監督信息來幫助無監督的聚類。在uci數據集上面的實驗結果表明,PLG-SSC算法能有效地提高聚類的準確率,是一種有前景的半監督聚類算法。
關鍵詞 半監督聚類; 類標記; 成對約束; 遺傳算法
中圖分類號: TP18 文獻標識碼: A
A Semi-Supervised Clustering Algorithm Based on Class Labels and Pairwise Constraints
Sheng Junjie Xie Licong
(FuZhou University College of Mathematics and Computer Science, Fujian Fuzhou 350108)
Abstract Semi-supervised clustering uses the samples’ supervised information to aid unsupervised learning.The samples’ supervised information include class labels information and pairwise constraints information(must-link constraints and cannot-link constraints). This paper presents a semi-supervised clustering algorithm based on class labels and pairwise constraints (PLG-SSC).The algorithm contains the advantages of the genetic algorithm, and makes good use of the preceding two aspects of supervised information to help unsupervised clustering.The results of experiments on the uci data sets confirm that PLG-SSC algorithm can improve the accuracy of clustering effectively,and that it is a promising semi-supervised clustering algorithm.
Key words Semi-Supervised Clustering; Class Labels; Pairwise Constraints; Genetic Algorithm
[1] M. Law. Clustering, Dimensionality Reduction, and Side Information. Department of Computer Science and Engineering, Michigan State University, 2006.
[2] J. de Freitas, G. L. Pappa, A. S. da Silva, M. A.Gon?calves, E. S. de Moura, A. Veloso, A. H. F.Laender, and M. G. de Carvalho. Active learning genetic programming for record deduplication. In IEEE Congress on Evolutionary Computation, 2010: 1–8.
[3] 朱金鈞,高凱,周萬珍.遺傳算法在數據挖掘中的應用.計算機工程與應用,2003, 17:203-206
[4] A. Demiriz, K. Bennett, K. P. Bennett, and M. J.Embrechts. Semi-supervised clustering using genetic algorithms ayhan demiriz. In In Artificial Neural Networks in Engineering ANNIE-99, ASME Press, 1999,pages:809–814.
[5] D. J. Newman, S. Hettich, C. L. Blake, and C. J.Merz. UCI Repository of machine learning databases.University of California, Irvine,http://www.ics.uci.edu/~mlearn/MLRepository.html,1998.
作者簡介:
盛俊杰,男,1988年生,碩士研究生,計算機應用技術專業,主要究方向:數據挖掘、數據集成。