Ensemble Learning with Resampling for Imbalanced Data
Springer Science and Business Media Deutschland GmbH
Imbalanced class distribution is an issue that appears in various applications. In this paper, we undertake a comprehensive study of the effects of sampling on the performance of bootstrap aggregating in the context of imbalanced data. Concretely, we carry out a comparison of sampling methods applied to single and ensemble classifiers. The experiments are conducted on simulated and real-life data using a range of sampling methods. The contributions of the paper are twofold: i) demonstrate the effectiveness of ensemble techniques based on resampled data over a single base classifier and ii) compare the effectiveness of different resampling techniques when used during the bagging stage for ensemble classifiers. The results reveal that ensemble methods overwhelmingly outperform single classifiers based on resampled data. In addition, we discover that NearMiss and random oversampling (ROS) are the optimal sampling algorithms for ensemble learning. © 2021, Springer Nature Switzerland AG.
This work is not available in the CUD collection. The version of the scholarly record of this work is published in Lecture Notes in Computer Science (2021), available online at: https://doi.org/10.1007/978-3-030-84529-2_48
Data preprocessing sampling, Ensemble method, Imbalanced data, Oversampling, Undersampling
Kamalov, F., Elnagar, A., & Leung, H. H. (2021). Ensemble Learning with Resampling for Imbalanced Data. In D.-S. Huang, K.-H. Jo, J. Li, V. Gribova, & A. Hussain (Eds.), Intelligent Computing Theories and Application. ICIC 2021. Lecture Notes in Computer Science, vol 12837 (pp. 564-578). Springer, Cham. https://doi.org/10.1007/978-3-030-84529-2_48