Least Loss: A simplified filter method for feature selection

Date
2020-09
Authors
Thabtah, Fadi
Kamalov, Firuz
Hammoud, Suhel
Shahamiri, Seyed Reza
Journal Title
Journal ISSN
Volume Title
Publisher
Elsevier Inc.
Abstract
Identifying the relevant set of features in a dataset is an important part of data analytics. Discarding significant variables or keeping irrelevant variables has significant effects on the performance of the learning algorithm during knowledge discovery. In this paper, a feature selection method called Least Loss (L2) is proposed that significantly reduces the dimensionality of data by disposing weakly correlated variables in a robust manner without diminishing the predictive performance of classifiers. The proposed method is based on quantifying the similarity between the observed and expected probabilities and generating scores for each independent variable, which makes it simple and intuitive. The evaluation of the proposed method was done by comparing its performance against Information Gain (IG) and Chi Square (CHI) feature selection methods on 27 different datasets modeled using a probabilistic classifier. The results reveal that L2 is highly competitive with respect to error rate, precision, and recall measures while substantially reducing the number of selected variables in the datasets. Our study would be of high interest to data analysts, scholars and domain experts who deal with applications that include large numbers of features using statistical analysis methods. © 2020 Elsevier Inc.
Description
This article is not available at CUD collection. The version of scholarly record of this article is published in Information Sciences (2020), available online at: https://doi.org/10.1016/j.ins.2020.05.017
Keywords
Classification, Data mining, Dimensionality reduction, Feature selection, Information science, Machine learning, Ranking of variables, Classification (of information), Data Analytics, Correlated variables, Feature selection methods, Independent variables, Irrelevant variables, Predictive performance, Probabilistic classifiers, Significant variables, Statistical analysis methods, Feature extraction
Citation
Thabtah, F., Kamalov, F., Hammoud, S., & Shahamiri, S. R. (2020). Least Loss: A simplified filter method for feature selection. Information Sciences, 534, 1-15. https://doi.org/10.1016/j.ins.2020.05.017