Feature Selection in Imbalanced Data

Kamalov, Firuz; Thabtah, Fadi; Leung, Ho Hon

Feature Selection in Imbalanced Data

Files

Access Instruction - 514.pdf (102.17 KB)

Date

2023-12

Authors

Kamalov, Firuz

Thabtah, Fadi

Leung, Ho Hon

Publisher

Springer Science and Business Media Deutschland GmbH

Abstract

The traditional feature selection methods are not suitable for imbalanced data as they tend to be biased towards the majority class. This problem is particularly acute in the field of medical diagnostics and fraud detection where the class distribution is highly skewed. In this paper, we propose a novel filter approach using decision tree-based F1-score. The F1-score incorporates the accuracy with respect to the minority class data and hence is a good measure in the case of imbalanced data. In the proposed implementation, the F1-score is calculated based on a 1-dimensional decision tree classifier resulting in a fast and effective feature evaluation method. Numerical experiments confirm that the proposed method achieves robust dimensionality reduction and accuracy results. In addition, the low computational complexity of the algorithm makes it a practical choice for big data applications. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.

Keywords

Big data, Data mining, F1-score, Feature selection, Filter method, Imbalanced data, Machine learning

Citation

Kamalov, F., Thabtah, F., & Leung, H. H. (2023). Feature selection in imbalanced data. Annals of Data Science, 10(6), 1527-1541. https://doi.org/10.1007/s40745-021-00366-5

URI

https://doi.org/10.1007/s40745-021-00366-5
http://hdl.handle.net/20.500.12519/514

Collections

Department of Electrical Engineering

Full item page

Feature Selection in Imbalanced Data

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

item.page.type

item.page.format

Keywords

Citation

URI

DOI

Collections