Monotonicity of the χ2-statistic and Feature Selection

Date
2020
Authors
Kamalov, Firuz
Leung, Ho Hon
Moussa, Sherif
Journal Title
Journal ISSN
Volume Title
Publisher
Springer Science and Business Media Deutschland GmbH
Abstract
Feature selection is an important preprocessing step in analyzing large scale data. In this paper, we prove the monotonicity property of the χ2-statistic and use it to construct a more robust feature selection method. In particular, we show that χY,X12≤χY,(X1,X2)2. This result indicates that a new feature should be added to an existing feature set only if it increases the χ2-statistic beyond a certain threshold. Our stepwise feature selection algorithm significantly reduces the number of features considered at each stage making it more efficient than other similar methods. In addition, the selection process has a natural stopping point thus eliminating the need for user input. Numerical experiments confirm that the proposed algorithm can significantly reduce the number of features required for classification and improve classifier accuracy. © 2020, Springer-Verlag GmbH Germany, part of Springer Nature.
Description
This article is not available at CUD collection. The version of scholarly record of this article is published in Annals of Data Science (2020), available online at: https://doi.org/10.1007/s40745-020-00251-7
Keywords
Big data, Feature selection, Machine learning, χ2-statistic
Citation
Kamalov, F., Leung, H.H. & Moussa, S. (2020). Monotonicity of the χ2-statistic and Feature Selection. Annals of Data Science. https://doi.org/10.1007/s40745-020-00251-7