Mr-arm : a map-reduce association rule mining framework
World Scientific Publishing Co. Pte Ltd
Association rule is one of the primary tasks in data mining that discovers correlations among items in a transactional database. The majority of vertical and horizontal association rule mining algorithms have been developed to improve the frequent items discovery step which necessitates high demands on training time and memory usage particularly when the input database is very large. In this paper, we overcome the problem of mining very large data by proposing a new parallel Map-Reduce (MR) association rule mining technique called MR-ARM that uses a hybrid data transformation format to quickly finding frequent items and generating rules. The MR programming paradigm is becoming popular for large scale data intensive distributed applications due to its efficiency, simplicity and ease of use, and therefore the proposed algorithm develops a fast parallel distributed batch set intersection method for finding frequent items. Two implementations (Weka, Hadoop) of the proposed MR association rule algorithm have been developed and a number of experiments against small, medium and large data collections have been conducted. The ground bases of the comparisons are time required by the algorithm for: data initialisation, frequent items discovery, rule generation, etc. The results show that MR-ARM is very useful tool for mining association rules from large datasets in a distributed environment. © 2013 World Scientific Publishing Company.
This article is not available at CUD collection. The version of scholarly record of this Article is published in Parallel Processing Letters (2013), available online at: https://doi.org/10.1142/S0129626413500126
Association rules mining, Distributed tasks, Hadoop, Map-reduce, Parallel process, Algorithms, Association rules, Multiprocessing systems, Data mining
Thabtah, F., & Hammoud, S. (2013). Mr-arm: A map-reduce association rule mining framework. Parallel Processing Letters, 23(3). https://doi.org/10.1142/S0129626413500126