Monotonicity of the χ2-statistic and Feature Selection

dc.contributor.author Kamalov, Firuz
dc.contributor.author Leung, Ho Hon
dc.contributor.author Moussa, Sherif
dc.date.accessioned 2020-12-12T05:58:50Z
dc.date.available 2020-12-12T05:58:50Z
dc.date.copyright © 2020
dc.date.issued 2020
dc.description This article is not available at CUD collection. The version of scholarly record of this article is published in Annals of Data Science (2020), available online at: https://doi.org/10.1007/s40745-020-00251-7 en_US
dc.description.abstract Feature selection is an important preprocessing step in analyzing large scale data. In this paper, we prove the monotonicity property of the χ2-statistic and use it to construct a more robust feature selection method. In particular, we show that χY,X12≤χY,(X1,X2)2. This result indicates that a new feature should be added to an existing feature set only if it increases the χ2-statistic beyond a certain threshold. Our stepwise feature selection algorithm significantly reduces the number of features considered at each stage making it more efficient than other similar methods. In addition, the selection process has a natural stopping point thus eliminating the need for user input. Numerical experiments confirm that the proposed algorithm can significantly reduce the number of features required for classification and improve classifier accuracy. © 2020, Springer-Verlag GmbH Germany, part of Springer Nature. en_US
dc.identifier.citation Kamalov, F., Leung, H.H. & Moussa, S. (2020). Monotonicity of the χ2-statistic and Feature Selection. Annals of Data Science. https://doi.org/10.1007/s40745-020-00251-7 en_US
dc.identifier.issn 21985804
dc.identifier.uri https://doi.org/10.1007/s40745-020-00251-7
dc.identifier.uri http://hdl.handle.net/20.500.12519/299
dc.language.iso en en_US
dc.publisher Springer Science and Business Media Deutschland GmbH en_US
dc.relation Authors Affiliations : Kamalov, F., Canadian University Dubai, Dubai, United Arab Emirates; Leung, H.H., UAE University, Al Ain, United Arab Emirates; Moussa, S., Canadian University Dubai, Dubai, United Arab Emirates
dc.relation.ispartofseries Annals of Data Science;
dc.rights Permission to reuse the abstract has been secured from Springer Science and Business Media Deutschland GmbH
dc.rights.holder Copyright : © 2020, Springer-Verlag GmbH Germany, part of Springer Nature.
dc.subject Big data en_US
dc.subject Feature selection en_US
dc.subject Machine learning en_US
dc.subject χ2-statistic en_US
dc.title Monotonicity of the χ2-statistic and Feature Selection en_US
dc.type Article en_US
Files
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.01 KB
Format:
Item-specific license agreed upon to submission
Description: