A robust domain partitioning intrusion detection method
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The capacity for data mining algorithms to learn rules from data is influenced by, inter-alia, the random nature of training and test data as well as by the diversity of domain partitioning models. Isolating normal from malicious data traffic across networks is one regular task that is naturally affected by that randomness and diversity. We propose a robust algorithm Sample-Measure-Assess (SMA) that detects intrusion based on rules learnt from multiple samples. We adapt data obtained from a set of simulations, capturing data attributes identifiable by number of bytes, destination and source of packets, protocol and nature of data flows (normal and abnormal) as well IP addresses. A fixed sample of 82,332 observations on 27 variables was drawn from a superset of 2.54 million observations on 49 variables and multiple samples were then repeatedly extracted from the former and used to train and test multiple versions of classifiers, via the algorithm. With two class labels–binary and multi-class, the dataset presents a classic example of masked and spurious groupings, making an ideal case for concept learning. The algorithm learns a model for the underlying distributions of the samples and it provides mechanics for model assessment. The settings account for our method's novelty–i.e., ability to learn concept rules from highly masked to highly spurious cases while observing model robustness. A comparative analysis of Random Forests and individually grown trees show that we can circumvent the former's dependence on multicollinearity of the trees and their individual strength in the forest by proceeding from dimensional reduction to classification using individual trees. Given data of similar structure, the algorithm can order the models in terms of optimality which, means our work can contribute towards understanding the concept of normal and malicious flows across tools. The algorithm yields results that are less sensitive to violated distributional assumptions and, hence, it yields robust parameters and provides a generalisation that can be monitored and adapted to specific low levels of variability. We discuss its potential for deployment with other classifiers and potential for extension into other applications, simply by adapting the objectives to specific conditions. © 2019