(北京航空航天大學軟件學院北京市 100191)
摘要:企業信息化管理手段越來越豐富,大數據的處理卻成為許多技術型企業的發展瓶頸,本文以某銷售企業的進銷存管理系統為藍本,研究從傳統單服務器系統升級為支持分布式并行處理的多服務器系統,即重構企業私有云平臺的技術實現方式。該方式采用分治策略,調用系統預設的算法,將大數據任務均勻分解為若干規模較小的子任務集群,由中心控制節點根據系統動態負載均衡原理分發到對應的空閑子任務處理節點上,再將全部子任務計算結果統計匯總,實現大數據任務的分布式處理功能。該系統主要應用于解決企業對海量數據挖掘的需求。
關鍵詞:大數據處理;分布式系統;云計算;私有云平臺;數據挖掘。
Design and Implementation of DistributedComputing Systemand Performance Study
Jianchao Yang
(School of Software, Beijing University of Aeronautics and Astronautics, Beijing 100191)
vheaven@163.com
Abstract:Enterprise information management is more and more abundant; the processing of big data becomes the developmentbottleneckin many technology companies. This paper takes a clothing sales enterprise’sInvoicing management system as model, to upgrade it from the traditional single server system to multi servers system which supportparallelprocessing on distributed system, namely restructures the enterprise private cloud platform. The technology uses the divide-conquer strategy, calls the system default algorithm, decomposes a large task into several smaller subtasks and data clusters. According to the dynamic load balancing principle, a central control node distributes these subtasks to the remain free load subtask processing nodes, then summarizes all the subtasks’ results,solving the big data processing problem. The system is mainly used to meet an enterprise’s massive data mining needs.
Key words:big data processing; distributed system; cloud computing; private cloud platform; data mining.
參考文獻
龔溪東,基于分布式服務的程序評測技術研究,北京航空航天大學,2011年5月
陸嘉恒,分布式系統及云計算概論,清華大學出版社,2011年5月
Gaston C. Hillar, Professional Parallel Programming,Jan 2012
Kai Hwang, Geoffrey C.Fox, Jack J.Dongarra, Distributed and Cloud Computing From Parallel Processing to the Internet of Things, Jan 2013
Qinyi Wu, Calton Pu, Danesh Irani, Cosmos A Wiki Data Management System, College of Computing Georgia Institute of Technology Atlanta, GA, 2010
Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, SCOPE Easy and Efficient Parallel Processing of Massive Data Sets, Microsoft Corporation, ACM 978-1-60558-306-8/08/08, August 2008
Jingren Zhou, Nicolas Bruno, Mingchuan Wu, SCOPE parallel databases meet MapReduce, The VLDB Journal (2012) 21:611–636 DOI 10.1007/s00778-012-0280-z,June 2012
Jiawei Han & Micheline Kamber & Jian Pei, Data Mining Concepts and Techniques, America, 2011
作者簡介:
楊劍超,北京航空航天大學軟件學院碩士研究生,移動云計算專業,微軟實習生,曾參與多個科研、商業項目研發,主要從事分布式計算,云計算,數據挖掘方向的研究。