ItemTowards a Simplified View of Data Management Maturity Models(World Scientific, 2023) Harguem, Saida; Ben Boubaker, Karim ItemTowards proactive crowd management based on predictive analytics framework(Institute of Advanced Scientific Research, Inc., 2019) Jaffar, D. Ahmad; Sergio, P. Rommel; Sharaf, M. Soheil; Abdullah, B. Munir ItemDesigning an MI-PCA based Agile Intrusion Detection System(Institute of Electrical and Electronics Engineers Inc., 2022) Kaushik, Sunil; Bhadrdwaj, Akashdeep; Rehman, Ateeq Ur; Bharany, Salil; Harguem, Saida; Kukunuru, Saigeeta; Thawabeh, Ossma Ali ItemInfluencing Factors of E-Learning Towards E-Learner's Satisfaction(Institute of Electrical and Electronics Engineers Inc., 2022) Harguem, Saida; Marwaha, Sunita; Noaman, Samar; Ali, Naeem; Ali, Nasir; Kanwal, Khadija ItemMachine Learning Based Prediction of Stock Exchange on NASDAQ 100: A Twitter Mining Approach(Institute of Electrical and Electronics Engineers Inc., 2022) Harguem, Saida; Chabani, Zakariya; Noaman, Samar; Amjad, Muhammad; Alvi, Muhammad Bux; Asif, Muhammad; Mehmood, Muhammad Hassaan; Al-Kassem, Amer Hani ItemOMCOKE: A Machine Learning Outlier-based Overlapping Clustering Technique for Multi-Label Data Analysis(Slovene Society Informatika, 2022-11) Baadel, Said; Thabtah, Fadi; Lu, Joan; Harguem, SaidaClustering is one of the challenging machine learning techniques due to its unsupervised learning nature. While many clustering algorithms constrain objects to single clusters, K-means overlapping partitioning clustering methods assign objects to multiple clusters by relaxing the constraints and allowing objects to belong to more than one cluster to better fit hidden structures in the data. However, when datasets contain outliers, they can significantly influence the mean distance of the data objects to their respective clusters, which is a drawback. Therefore, most researchers address this problem by simply removing the outliers. This can be problematic especially in applications such as fraud detection or cybersecurity attacks risk analysis. In this study, an alternative solution to this problem is proposed that captures outliers and stores them on-the-fly within a new cluster, instead of discarding. The new algorithm is named Outlier-based Multi-Cluster Overlapping K-Means Extension (OMCOKE). Empirical results on real-life multi-label datasets were derived to compare OMCOKE’s performance with other common overlapping clustering techniques. The results show that OMCOKE produced a better precision rate compared to the considered clustering algorithms. This method can benefit various stakeholders as these outliers could have real-life applications in cybersecurity, fraud detection, and the anti-phishing of websites. © 2022 Slovene Society Informatika. All rights reserved. ItemInformation Technology Governance in the Tunisian Banking Industry: An Exploratory Study(Richtmann Publishing Ltd, 2022-05) Harguem, Saida; Boubaker, Karim Ben; Echatti, HoucineInformation Technology (IT) has become the foundation for supporting and sustaining businesses. IT strategic importance has prompted many organizations to extend Governance to IT and place it high on their agendas. Banks are among those organizations that heavily use IT to enhance their service delivery capabilities. Besides, globalization, competition, and compliance requirements pushed banks to consider IT Governance as part of their overall corporate governance strategy. Past studies have shown that IT Governance in the financial industry is more mature than in other sectors. However, there is little information about IT Governance in economically developing nations. This article conducted a Delphi study to evaluate the Perceived Efficiency and Ease of Implementation of IT Governance practices in the Tunisian banking industry. The results show that compared with Process and Relational Mechanisms, Structural Practices are perceived to be more effective and easier to implement. This research helps to understand better the current state of IT Governance Implementation in less developed countries. © 2022 Harguem et al. ItemTowards Goal-Oriented Software Requirements Elicitation(Institute of Electrical and Electronics Engineers Inc., 2021) Redouane, AbdesselamCorrect and unambiguous software requirements are key to the success of any software engineering project. Eliciting such requirements is a daunting task. In this paper, we present a framework that uses goal orientation as its main building blocks. Unlike other frameworks that have been reported in the literature, this framework strives to balance a compromise between formal methods on one hand and natural language on the other hand in specifying operations. A Chabot for covid-19 is presented to illustrate the framework. © 2021 IEEE. ItemPhishing detection based associative classification data mining(Elsevier Ltd, 2014-10-01) Abdelhamid, Neda; Ayesh, Aladdin; Thabtah, Fadi ItemTutorial and critical analysis of phishing websites methods(Elsevier Ireland Ltd, 2015-08-01) Mohammad, Rami M.; Thabtah, Fadi; McCluskey, Lee ItemA Conceptual Framework on IT Governance Impact on Organizational Performance: A Dynamic Capability Perspective(Richtmann Publishing Ltd, 2021-01-17) Harguem, SaidaRecent years have seen substantial growth in Information Technology Governance (ITG) research. However, the influence of ITG on organizational performance has been less covered and very little theorized. To address this gap, the purpose of this paper is to build a conceptual framework to provide a better understanding of ITG contribution to organizational performance. Based on an extensive literature review on ITG and guided by the dynamic capabilities perspective, the proposed conceptual framework analyses the ITG – Organizational Performance relationship through the lenses of the dynamic capability perspective and generate a set of five research propositions. The proposed framework suggests that the effectiveness of ITG mechanisms (structures, processes, and relational mechanisms), contribute to the development of a dynamic ITG competence which has an impact on the development of IT management capabilities and their evolution. Moreover, the proposed conceptual framework suggests that ITG is more likely to lead to better organizational performance when IT management capabilities are developed in line with business strategy. © 2021 Saida Harguem. ItemPredicting phishing websites based on self-structuring neural network(Springer London, 2014-08) Mohammad, Rami M.; Thabtah, Fadi; McCluskey, Lee ItemEnterprise web services-enabled translation framework(Springer Nature Switzerland AG, 2011) Serhani, Mohamed Adel; Jaffar, Ahmed; Campbell, Piers; Atif, Y.Managing multilingual documents is a time consuming, error prone and expensive task, particularly when dealing with dynamic documents such as web contents. A broad spectrum of organizations such as corporations, NGO's and Governments are committed to offer such documents in a number of languages where the content is further localized to suit specific cultural settings. In this paper, we propose a business model supported by a web services-enabled framework, which facilitate all aspects related to multilingual web contents management, from negotiating translation-request quotations through production of final localized output as well as its verification, and delivery. This service is based on a collaborative internet-based translation framework, referred to in this paper as Translation Management System (TMS). Our approach uses XLIFF, a Web service standard developed by OASIS, in order to interoperate enterprise translation services and related Web applications. We present and implement a translation business model centered around standardized processes, which we validate through a case study in the context of a Web translation project. We also propose a QoS monitoring model to satisfy the quality-related requirements of a translation job. Finally, we evaluate the usability of our streamlined Web translation services through users' perception in terms of flexibility, ease of use, and quality of translation. The results revealed interesting performance tradeoffs relative to translation workflows and content-translation accuracy as well as flexibility, and diversity of TMS provided services. © 2010 Springer-Verlag. ItemA dynamic rule-induction method for classification in data mining(Taylor and Francis Ltd., 2015) Qabajeh, Issa; Chiclana, Francisco; Thabtah, FadiRule induction (RI) produces classifiers containing simple yet effective ‘If–Then' rules for decision makers. RI algorithms normally based on PRISM suffer from a few drawbacks mainly related to rule pruning and rule-sharing items (attribute values) in the training data instances. In response to the above two issues, a new dynamic rule induction (DRI) method is proposed. Whenever a rule is produced and its related training data instances are discarded, DRI updates the frequency of attribute values that are used to make the next in-line rule to reflect the data deletion. Therefore, the attribute value frequencies are dynamically adjusted each time a rule is generated rather statically as in PRISM. This enables DRI to generate near perfect rules and realistic classifiers. Experimental results using different University of California Irvine data sets show competitive performance in regards to error rate and classifier size of DRI when compared to other RI algorithms. © 2015, © 2015 Antai College of Economics and Management, Shanghai Jiao Tong University. ItemMr-arm : a map-reduce association rule mining framework(World Scientific Publishing Co. Pte Ltd, 2013) Thabtah, Fadi; Hammoud, SuhelAssociation rule is one of the primary tasks in data mining that discovers correlations among items in a transactional database. The majority of vertical and horizontal association rule mining algorithms have been developed to improve the frequent items discovery step which necessitates high demands on training time and memory usage particularly when the input database is very large. In this paper, we overcome the problem of mining very large data by proposing a new parallel Map-Reduce (MR) association rule mining technique called MR-ARM that uses a hybrid data transformation format to quickly finding frequent items and generating rules. The MR programming paradigm is becoming popular for large scale data intensive distributed applications due to its efficiency, simplicity and ease of use, and therefore the proposed algorithm develops a fast parallel distributed batch set intersection method for finding frequent items. Two implementations (Weka, Hadoop) of the proposed MR association rule algorithm have been developed and a number of experiments against small, medium and large data collections have been conducted. The ground bases of the comparisons are time required by the algorithm for: data initialisation, frequent items discovery, rule generation, etc. The results show that MR-ARM is very useful tool for mining association rules from large datasets in a distributed environment. © 2013 World Scientific Publishing Company. ItemAssociative classification approaches : review and comparison(World Scientific Publishing Co. Pte Ltd, 2014) Abdelhamid, Neda; Thabtah, FadiAssociative classification (AC) is a promising data mining approach that integrates classification and association rule discovery to build classification models (classifiers). In the last decade, several AC algorithms have been proposed such as Classification based Association (CBA), Classification based on Predicted Association Rule (CPAR), Multi-class Classification using Association Rule (MCAR), Live and Let Live (L3) and others. These algorithms use different procedures for rule learning, rule sorting, rule pruning, classifier building and class allocation for test cases. This paper sheds the light and critically compares common AC algorithms with reference to the abovementioned procedures. Moreover, data representation formats in AC mining are discussed along with potential new research directions. © 2014 World Scientific Publishing Co. ItemParallel associative classification data mining frameworks based mapreduce(World Scientific Publishing Co. Pte Ltd, 2015-06) Thabtah, Fadi; Hammoud, Suhel; Abdel-Jaber, HusseinAssociative classification (AC) is a research topic that integrates association rules with classification in data mining to build classifiers. After dissemination of the Classification-based Association Rule algorithm (CBA), the majority of its successors have been developed to improve either CBA's prediction accuracy or the search for frequent ruleitems in the rule discovery step. Both of these steps require high demands in processing time and memory especially in cases of large training data sets or a low minimum support threshold value. In this paper, we overcome the problem of mining large training data sets by proposing a new learning method that repeatedly transforms data between line and item spaces to quickly discover frequent ruleitems, generate rules, subsequently rank and prune rules. This new learning method has been implemented in a parallel Map-Reduce (MR) algorithm called MRMCAR which can be considered the first parallel AC algorithm in the literature. The new learning method can be utilised in the different steps within any AC or association rule mining algorithms which scales well if contrasted with current horizontal or vertical methods. Two versions of the learning method (Weka, Hadoop) have been implemented and a number of experiments against different data sets have been conducted. The ground bases of the comparisons are classification accuracy and time required by the algorithm for data initialization, frequent ruleitems discovery, rule generation and rule pruning. The results reveal that MRMCAR is superior to both current AC mining algorithms and rule based classification algorithms in improving the classification performance with respect to accuracy. © 2015 World Scientific Publishing Company. ItemModeling discrete-time analytical models based on random early detection : exponential and linear(World Scientific Publishing Co. Pte Ltd, 2015) Abdel-Jaber, Hussein; Thabtah, Fadi; Woodward, MikeCongestion control is among primary topics in computer network in which random early detection (RED) method is one of its common techniques. Nevertheless, RED suffers from drawbacks in particular when its "average queue length" is set below the buffer's "minimum threshold" position which makes the router buffer quickly overflow. To deal with this issue, this paper proposes two discrete-time queue analytical models that aim to utilize an instant queue length parameter as a congestion measure. This assigns mean queue length (mql) and average queueing delay smaller values than those for RED and eventually reduces buffers overflow. A comparison between RED and the proposed analytical models was conducted to identify the model that offers better performance. The proposed models outperform the classic RED in regards to mql and average queueing delay measures when congestion exists. This work also compares one of the proposed models (RED-Linear) with another analytical model named threshold-based linear reduction of arrival rate (TLRAR). The results of the mql, average queueing delay and the probability of packet loss for TLRAR are deteriorated when heavy congestion occurs, whereas, the results of our RED-Linear were not impacted and this shows superiority of our model. © 2015 World Scientific Publishing Company. ItemPhishing detection : a case analysis on classifiers with rules using machine learning(World Scientific Publishing Co. Pte Ltd, 2017-12-01) Thabtah, Fadi; Kamalov, FiruzA typical predictive approach in data mining that produces If-Then knowledge for decision making is rule-based classification. Rule-based classification includes a large number of algorithms that fall under the categories of covering, greedy, rule induction, and associative classification. These approaches have shown promising results due to the simplicity of the models generated and the user's ability to understand, and maintain them. Phishing is one of the emergent online threats in web security domains that necessitates anti-phishing models with rules so users can easily differentiate among website types. This paper critically analyses recent research studies on the use of predictive models with rules for phishing detection, and evaluates the applicability of these approaches on phishing. To accomplish our task, we experimentally evaluate four different rule-based classifiers that belong to greedy, associative classification and rule induction approaches on real phishing datasets and with respect to different evaluation measures. Moreover, we assess the classifiers derived and contrast them with known classic classification algorithms including Bayes Net and Simple Logistics. The aim of the comparison is to determine the pros and cons of predictive models with rules and reveal their actual performance when it comes to detecting phishing activities. The results clearly showed that eDRI, a recently greedy algorithm, not only generates useful models but these are also highly competitive with respect to predictive accuracy as well as runtime when they are employed as anti-phishing tools. © 2017 World Scientific Publishing Co. ItemAssociative classification common research challenges(Institute of Electrical and Electronics Engineers Inc., 2016) Abdelhamid, Neda; Jabbar, Ahmad Abdul; Thabtah, FadiAssociation rule mining involves discovering concealed correlations among variables often from sales transactions to help managers in key business decision involving items shelving, sales and planning. In the last decade, association rule mining methods have been employed in deriving rules from classification dataset in different business domains. This has resulted in an emergence of new classification approach called Associative Classification (AC), which often produces higher predictive classifiers than classic approaches such as decision trees, greedy and rule induction. Nevertheless, AC suffers from noticeable challenges some of which have been inherited from association rules and others have been resulted from building the classifier phase. These challenges are not limited to the massive numbers of candidate ruleitems found, the very large classifiers derived, the inability to handle multi-label datasets, and the design of rule pruning, ranking and prediction procedures. This article highlights and critically analyzes common challenges faced by AC algorithms that are still sustained. Hence, it opens the door for interested researchers to further investigate these challenges hoping to enhance the overall performance of this approach and increase it applicability in research domains. © 2016 IEEE.