Ensemble Learning with Resampling for Imbalanced Data

Date

2021

Journal Title

Journal ISSN

Volume Title

Publisher

Springer Science and Business Media Deutschland GmbH

Abstract

Imbalanced class distribution is an issue that appears in various applications. In this paper, we undertake a comprehensive study of the effects of sampling on the performance of bootstrap aggregating in the context of imbalanced data. Concretely, we carry out a comparison of sampling methods applied to single and ensemble classifiers. The experiments are conducted on simulated and real-life data using a range of sampling methods. The contributions of the paper are twofold: i) demonstrate the effectiveness of ensemble techniques based on resampled data over a single base classifier and ii) compare the effectiveness of different resampling techniques when used during the bagging stage for ensemble classifiers. The results reveal that ensemble methods overwhelmingly outperform single classifiers based on resampled data. In addition, we discover that NearMiss and random oversampling (ROS) are the optimal sampling algorithms for ensemble learning. © 2021, Springer Nature Switzerland AG.

Description

This work is not available in the CUD collection. The version of the scholarly record of this work is published in Lecture Notes in Computer Science (2021), available online at: https://doi.org/10.1007/978-3-030-84529-2_48

Keywords

Data preprocessing sampling, Ensemble method, Imbalanced data, Oversampling, Undersampling

Citation

Kamalov, F., Elnagar, A., & Leung, H. H. (2021). Ensemble Learning with Resampling for Imbalanced Data. In D.-S. Huang, K.-H. Jo, J. Li, V. Gribova, & A. Hussain (Eds.), Intelligent Computing Theories and Application. ICIC 2021. Lecture Notes in Computer Science, vol 12837 (pp. 564-578). Springer, Cham. https://doi.org/10.1007/978-3-030-84529-2_48

DOI