山東大學計算機科學與技術學院,濟南,中國,250101
摘 要:近期,隨著諸如實時監控系統、網絡入侵檢測和web上用戶點擊流等動態的應用環境源源不斷地產生海量的、時序的、快速變化的和潛在無限的數據流,對數據流的異常檢測研究變得重要而富有意義。數據流聚類是數據挖掘領域的研究熱點,在近期被高度重視和廣泛研究。
本文建立了一個通用的基于數據流聚類分析的異常檢測模型,
通過改進數據流聚類算法CluStream,提出了適合數據流異常檢測的算法ACluStream,利用其聯機微聚類處理對數據流進行聚類,并按
金字塔時間框架保存聚類特征信息,再使用離線宏聚類處理
檢測出異常數據流。實驗結果證明,該模型能較好地應用于數據流的異常檢測。
關鍵詞:數據流 聚類 異常檢測
Anomaly detection of data stream
based on clustering analysis
Zhiyuan SHANG Yuegong ZHANG Nuonuo ZHANG
School of Computer Science and Technology, Shandong University, Jinan, China 250101
Abstract: The research to data streaming model has recently gained a high attraction due to its applications, including real-time surveillance systems, network intrusion detection and click streams. Clustering based on data streaming, one of the most important in data mining, has recently been highly explored because its application to data summarization and outlier detection. This paper established a general model of the anomaly detection based on clustering analysis of data stream. Through improving CluStream which is Clustering algorithm of data stream, we present ACluStream algorithm for anomaly detection of data stream. The stream clustering approach is separated into online components which cluster stream and store clustering features according to Pyramidal time frame and offline components which detect anomaly of data stream. The experiment results show that the proposed algorithms and models are very effective to anomaly detection of data stream.
Key words: Data Stream; Clustering; anomaly detection
Intrusion detection technology is divided into anomaly detection and misuse detection, and anomaly detection
gain extensive attention because it does not require a priori knowledge of the intrusion and the ability to capture a new intrusion of previously unknown. Network anomaly detection method is developed by NSM (network security monitor)
[1] system from 1990 by Heberlein, to date, there are probability statistical analysis methods, data mining methods, and simulations of biological systems methods, etc. Anomaly detection technique based on data mining using association rules, sequence mining, data classification and clustering algorithms, automatically generated model of concise and accurate detection from a large number of network data. The characteristics of the data stream is continue to arrive, fast and large-scale. Anomaly detection technologies of data stream due to financial risk analysis, network monitoring, trend analysis, network communication, sensor networks and other areas of potential applications is also growing concern by academia and industry
[2]. Therefore, anomaly detection of data stream based on clustering analysis has become an important research direction.
References
[1]Heberlein L;Dias G V;Levitt K N ;A network security monitor 1990
[2] Han Jiawei,Kamber M.Data Mining Concepts and Techniques,Second
Edition.China Machine Press.2007:251-300P
[3] Guha S, Mishra N, Motwani R, et al. Clustering data stream[C]//Proceedings of the 41st Annual Symposium on Foundations of Computer Science. Redondo Beach: IEEE Computer Society, 2000: 359−366.
[4] O’Callaghan L, Mishra N, Meyerson A, et al. Streaming-data algorithms for high-quality clustering[C]//Proceedingsof IEEE International Conference on Data Engineering,2002: 1−25.
[5]Aggarwal C C, Han J W, Wang J Y.A framework for clustering evolving data streams. Proceedings of the 29th Very Large Databases Conference. Berlin: VLDB Endowment, 2003:81-92.
[6] Aggarwal C C, Han J W, Wang J Y.A framework for projected clustering of high dimensional data streams .Proceeding of the 30th Very Large Databases Conference. Toronto: VLDB Endowment, 2004:852-863.
[7] D. Hawkins. Identification of outliers. London: Chapman and Hall, 1980, 15-16
[8]Peixia Tang。Research and apply on Algorithms for Clustering Data Stream。Shandong Normal University。2008
作者簡介:尚志遠,男,1983年2月生,現為山東大學計算機科學與技術學院計算機應用專業2009級研究生,研究主要方向為網絡安全。