<p id="nxp5x"><big id="nxp5x"><noframes id="nxp5x">

    <var id="nxp5x"><video id="nxp5x"></video></var>

          <em id="nxp5x"></em>

              首 頁 本刊概況 出 版 人 發行統計 在線訂閱 歡迎投稿 市場分析 1 組織交流 1 關于我們
             
            1
               通信短波
            1
               新品之窗
            1
               優秀論文
            1
               通信趨勢
            1
               特別企劃
            1
               運營商動態
            1
               技術前沿
            1
               市場聚焦
            1
               通信視點
            1
               信息化論壇
            1
            當前位置:首頁 > 優秀論文
            基于Map/Reduce集群上的模式空間劃分的數據挖掘
            作者:劉 騫1,陳 明2
            來源:本站原創
            更新時間:2012/8/2 11:06:00
            正文:

            (1,2 中國石油大學 計算機科學與技術系,北京102249)

            摘  要:通過模式空間劃分將基于Map/Reduce處理數據集與候選目標模式集的多對多的對應關系的問題轉化為處理數據集與各子模式空間的多對多的對應關系問題。大大縮小了中間結果鍵值對集合的規模,避免了由于組合爆炸導致的單一Map節點的瓶頸問題。通過多輪的Map/Reduce任務,實現了模式空間的建立、劃分,過濾規則的建立、使用,并在此基礎上實現了各子模式空間上獨立地進行復雜類型模式的數據挖掘。通過充分利用整個模式空間的全局特征及各子模式空間的個性特征,設計了優化的挖掘算法從而提高了挖掘階段的效率。
            關鍵詞:Map/Reduce;模式空間劃分;數據挖掘;云計算
            Data Mining based on Pattern Space Division in Map/Reduce Cluster
            LIU Qian1, CHEN Ming 2
            (1,2 Department of Computer Science and Technology, China University of Petroleum, Beijing 102249, China)
            Abstract:By means of pattern space division and based on Map/Reduce, the problem of processing the many-to-many corresponding relationship between the data set and the patterns set is converted to the problem of processing the many-to-many corresponding relationship between the data subsets and the pattern subspaces associated with the radix of pattern subspaces. Thus, the size of the intermediate key/value pairs set is reduced so dramatically that the problem of single Map node bottleneck which results from combinatorial explosion of candidate pattern space is avoided. Over several rounds of Map/Reduce tasks, the pattern space is constructed and divided, the filtering rules is set up and used, father more, data mining is realized in each pattern subspace independently. By making the best of both the universal trait of the whole pattern space and the individuality of pattern subspace, the optimized non-recursive algorithm is designed and implemented to improve the efficiency of mining phase
            Key words:Map/Reduce; Pattern Space Division; Data Mining; Cloud Computing

             

             


            參考文獻:
            Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google file system. In 19th Symposium on Operating Systems Principles, pages 29.43, Lake George, New York, 2003. To appear in OSDI 2004 12
            Dean, Jeffrey; Ghemawat, Sanjay. Map/Reduce: Simplified Data Processing on Large Clusters.  Communications of the ACM, 2008, 51(1): p107~113.
            Google Inc. Protocol Buffers:Google’s data interchange format.(2010) <http://code.google.com/p/protobuf/>Accessed 26.01.10
            McCreadie,R.,et al. Map/Reduce indexing strategies: Studying scalability and efficiency. Information processing and Management (2011),doi:10.1016/j.ipm.2010.12.003

            作者簡介:
            劉騫 1985,男,博士生,研究方向為分布式計算、計算智能、軟件工程;
            陳明,1949,男,教授,博士生導師,研究方向為分布式計算、計算智能、軟件工程。

             
             
               
            《通信市場》 中國·北京·復興路49號通信市場(100036) 點擊查看具體位置
            電話:86-10-6820 7724, 6820 7726
            京ICP備05037146號-8
            建議使用 Microsoft IE4.0 以上版本 800*600瀏覽 如果您有什么建議和意見請與管理員聯系
            欧美成人观看免费全部欧美老妇0