Feature Selection in Imbalanced Data

dc.contributor.author Kamalov, Firuz
dc.contributor.author Thabtah, Fadi
dc.contributor.author Leung, Ho Hon
dc.date.accessioned 2022-02-16T15:29:29Z
dc.date.available 2022-02-16T15:29:29Z
dc.date.copyright © 2021
dc.date.issued 2022
dc.description This article is not available at CUD collection. The version of scholarly record of this article is published in Annals of Data Science (2022), available online at: https://doi.org/10.1007/s40745-021-00366-5 en_US
dc.description.abstract The traditional feature selection methods are not suitable for imbalanced data as they tend to be biased towards the majority class. This problem is particularly acute in the field of medical diagnostics and fraud detection where the class distribution is highly skewed. In this paper, we propose a novel filter approach using decision tree-based F1-score. The F1-score incorporates the accuracy with respect to the minority class data and hence is a good measure in the case of imbalanced data. In the proposed implementation, the F1-score is calculated based on a 1-dimensional decision tree classifier resulting in a fast and effective feature evaluation method. Numerical experiments confirm that the proposed method achieves robust dimensionality reduction and accuracy results. In addition, the low computational complexity of the algorithm makes it a practical choice for big data applications. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature. en_US
dc.identifier.citation Kamalov, F., Thabtah, F., & Leung, H. H. (2022). Feature selection in imbalanced data. Annals of Data Science, https://doi.org/10.1007/s40745-021-00366-5 en_US
dc.identifier.issn 21985804
dc.identifier.uri https://doi.org/10.1007/s40745-021-00366-5
dc.identifier.uri http://hdl.handle.net/20.500.12519/514
dc.language.iso en en_US
dc.publisher Springer Science and Business Media Deutschland GmbH en_US
dc.relation Authors Affiliations : Kamalov, F., Canadian University of Dubai, Dubai, United Arab Emirates; Thabtah, F., Manukau Institute of Technology, Manukau, New Zealand; Leung, H.H., UAE University, Al Ain, United Arab Emirates
dc.relation.ispartofseries Annals of Data Science;
dc.rights License to reuse the abstract has been secured from Springer Nature and Copyright Clearance Center.
dc.rights.holder Copyright : © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
dc.rights.license License Number : 5250800104747 License date : Feb 16, 2022
dc.rights.uri https://s100.copyright.com/CustomerAdmin/PLF.jsp?ref=c65c4d0c-e3ff-4f08-b7a8-401525d578da
dc.subject Big data en_US
dc.subject Data mining en_US
dc.subject F1-score en_US
dc.subject Feature selection en_US
dc.subject Filter method en_US
dc.subject Imbalanced data en_US
dc.subject Machine learning en_US
dc.title Feature Selection in Imbalanced Data en_US
dc.type Article en_US
Files
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.01 KB
Format:
Item-specific license agreed upon to submission
Description: