Monotonicity of the χ2-statistic and Feature Selection

dc.contributor.authorKamalov, Firuz
dc.contributor.authorLeung, Ho Hon
dc.contributor.authorMoussa, Sherif
dc.date.accessioned2020-12-12T05:58:50Z
dc.date.available2020-12-12T05:58:50Z
dc.date.copyright© 2020
dc.date.issued2022
dc.descriptionThis article is not available at CUD collection. The version of scholarly record of this article is published in Annals of Data Science (2020), available online at: https://doi.org/10.1007/s40745-020-00251-7en_US
dc.description.abstractFeature selection is an important preprocessing step in analyzing large scale data. In this paper, we prove the monotonicity property of the χ2-statistic and use it to construct a more robust feature selection method. In particular, we show that χY,X12≤χY,(X1,X2)2. This result indicates that a new feature should be added to an existing feature set only if it increases the χ2-statistic beyond a certain threshold. Our stepwise feature selection algorithm significantly reduces the number of features considered at each stage making it more efficient than other similar methods. In addition, the selection process has a natural stopping point thus eliminating the need for user input. Numerical experiments confirm that the proposed algorithm can significantly reduce the number of features required for classification and improve classifier accuracy. © 2020, Springer-Verlag GmbH Germany, part of Springer Nature.en_US
dc.identifier.citationKamalov, F., Leung, H. H., & Moussa, S. (2022). Monotonicity of the χ2 -statistic and feature selection. Annals of Data Science, 9(6), 1223-1241. https://doi.org/10.1007/s40745-020-00251-7en_US
dc.identifier.issn21985804
dc.identifier.urihttps://doi.org/10.1007/s40745-020-00251-7
dc.identifier.urihttp://hdl.handle.net/20.500.12519/299
dc.language.isoenen_US
dc.publisherSpringer Science and Business Media Deutschland GmbHen_US
dc.relationAuthors Affiliations : Kamalov, F., Canadian University Dubai, Dubai, United Arab Emirates; Leung, H.H., UAE University, Al Ain, United Arab Emirates; Moussa, S., Canadian University Dubai, Dubai, United Arab Emirates
dc.relation.ispartofseriesAnnals of Data Science; Volume 9, Issue 6
dc.rightsPermission to reuse the abstract has been secured from Springer Science and Business Media Deutschland GmbH
dc.rights.holderCopyright : © 2020, Springer-Verlag GmbH Germany, part of Springer Nature.
dc.subjectBig dataen_US
dc.subjectFeature selectionen_US
dc.subjectMachine learningen_US
dc.subjectχ2-statisticen_US
dc.titleMonotonicity of the χ2-statistic and Feature Selectionen_US
dc.typeArticleen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Access Instruction 299.pdf
Size:
93.08 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.01 KB
Format:
Item-specific license agreed upon to submission
Description: