A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning

dc.contributor.authorElreedy, Dina
dc.contributor.authorAtiya, Amir F.
dc.contributor.authorKamalov, Firuz
dc.date.accessioned2023-02-16T06:23:06Z
dc.date.available2023-02-16T06:23:06Z
dc.date.copyright© 2023
dc.date.issued2023
dc.descriptionThis work is licensed under Creative Commons License and full text is openly accessible in CUD Digital Repository. The version of the scholarly record of this article is published in Machine Learning (2023), accessible online through this link https://doi.org/10.1007/s10994-022-06296-4
dc.description.abstractClass imbalance occurs when the class distribution is not equal. Namely, one class is under-represented (minority class), and the other class has significantly more samples in the data (majority class). The class imbalance problem is prevalent in many real world applications. Generally, the under-represented minority class is the class of interest. The synthetic minority over-sampling technique (SMOTE) method is considered the most prominent method for handling unbalanced data. The SMOTE method generates new synthetic data patterns by performing linear interpolation between minority class samples and their K nearest neighbors. However, the SMOTE generated patterns do not necessarily conform to the original minority class distribution. This paper develops a novel theoretical analysis of the SMOTE method by deriving the probability distribution of the SMOTE generated samples. To the best of our knowledge, this is the first work deriving a mathematical formulation for the SMOTE patterns’ probability distribution. This allows us to compare the density of the generated samples with the true underlying class-conditional density, in order to assess how representative the generated samples are. The derived formula is verified by computing it on a number of densities versus densities computed and estimated empirically. © 2023, The Author(s).
dc.identifier.citationElreedy, D., Atiya, A. F., & Kamalov, F. (2023). A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning. Machine Learning, https://doi.org/10.1007/s10994-022-06296-4
dc.identifier.issn08856125
dc.identifier.urihttps://doi.org/10.1007/s10994-022-06296-4
dc.identifier.urihttps://hdl.handle.net/20.500.12519/745
dc.language.isoen_US
dc.publisherSpringer
dc.relationAuthors Affiliations : Elreedy, D., Computer Engineering Department, Cairo University, Giza, 12613, Egypt; Atiya, A.F., Computer Engineering Department, Cairo University, Giza, 12613, Egypt; Kamalov, F., Department of Electrical Engineering, Canadian University Dubai, Dubai, 117781, United Arab Emirates
dc.relation.ispartofseriesMachine Learning
dc.rightsCreative Commons Attribution 4.0 International License
dc.rights.holderCopyright : © 2023, The Author(s).
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectClass imbalance
dc.subjectDistribution density
dc.subjectMinority class
dc.subjectOver-sampling
dc.subjectSMOTE
dc.titleA theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning
dc.typeArticle

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
745.pdf
Size:
1.31 MB
Format:
Adobe Portable Document Format
Description: