Ensemble Learning with Resampling for Imbalanced Data

Kamalov, Firuz
Elnagar, Ashraf
Leung, Ho Hon
Journal Title
Journal ISSN
Volume Title
Springer Science and Business Media Deutschland GmbH
Imbalanced class distribution is an issue that appears in various applications. In this paper, we undertake a comprehensive study of the effects of sampling on the performance of bootstrap aggregating in the context of imbalanced data. Concretely, we carry out a comparison of sampling methods applied to single and ensemble classifiers. The experiments are conducted on simulated and real-life data using a range of sampling methods. The contributions of the paper are twofold: i) demonstrate the effectiveness of ensemble techniques based on resampled data over a single base classifier and ii) compare the effectiveness of different resampling techniques when used during the bagging stage for ensemble classifiers. The results reveal that ensemble methods overwhelmingly outperform single classifiers based on resampled data. In addition, we discover that NearMiss and random oversampling (ROS) are the optimal sampling algorithms for ensemble learning. © 2021, Springer Nature Switzerland AG.
This conference paper is not available at CUD collection. The version of scholarly record of this conference paper is published in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2021), available online at: https://doi.org/10.1007/978-3-030-84529-2_48
Data preprocessing sampling, Ensemble method, Imbalanced data, Oversampling, Undersampling
Kamalov, F., Elnagar, A., & Leung, H. H. (2021). Ensemble Learning with Resampling for Imbalanced Data. In D.-S. Huang, K.-H. Jo, J. Li, V. Gribova, & A. Hussain (Eds.), Intelligent Computing Theories and Application. ICIC 2021. Lecture Notes in Computer Science, vol 12837 (pp. 564-578). Springer, Cham. https://doi.org/10.1007/978-3-030-84529-2_48