Kernel density estimation based sampling for imbalanced class distribution

dc.contributor.author Kamalov, Firuz
dc.date.accessioned 2021-03-24T05:55:00Z
dc.date.available 2021-03-24T05:55:00Z
dc.date.copyright © 2019
dc.date.issued 2020-02
dc.description This article is not available at CUD collection. The version of scholarly record of this article is published in Information Sciences (2020), available online at: https://doi.org/10.1016/j.ins.2019.10.017 en_US
dc.description.abstract Imbalanced response variable distribution is a common occurrence in data science. In fields such as fraud detection, medical diagnostics, system intrusion detection and many others where abnormal behavior is rarely observed the data under study often features disproportionate target class distribution. One common way to combat class imbalance is through resampling of the minority class to achieve a more balanced distribution. In this paper, we investigate the performance of the sampling method based on kernel density estimation (KDE). We believe that KDE offers a more natural way to generate new instances of minority class that is less prone to overfitting than other standard sampling techniques. It is based on a well established theory of nonparametric statistical estimation. Numerical experiments show that KDE can outperform other sampling techniques on a range of real life datasets as measured by F1-score and G-mean. The results remain consistent across a number of classification algorithms used in the experiments. Furthermore, the proposed method outperforms the benchmark methods irregardless of the class distribution ratio. We conclude, based on the solid theoretical foundation and strong experimental results, that the proposed method would be a valuable tool in problems involving imbalanced class distribution. © 2019 Elsevier Inc. en_US
dc.identifier.citation Kamalov, F. (2020). Kernel density estimation based sampling for imbalanced class distribution. Information Sciences, 512, 1192-1201. https://doi.org/10.1016/j.ins.2019.10.017 en_US
dc.identifier.issn 00200255
dc.identifier.uri https://doi.org/10.1016/j.ins.2019.10.017
dc.identifier.uri http://hdl.handle.net/20.500.12519/353
dc.language.iso en en_US
dc.publisher Elsevier Inc. en_US
dc.relation Author Affiliation : Kamalov, F., Department of Electrical Engineering, Canadian University Dubai, Dubai, United Arab Emirates
dc.relation.ispartofseries Information Sciences;Volume 512
dc.rights License to reuse the abstract has been secured from Elsevier and Copyright Clearance Center.
dc.rights.holder Copyright : © 2019 Elsevier Inc.
dc.rights.uri https://s100.copyright.com/CustomerAdmin/PLF.jsp?ref=4c2dc004-878e-48fb-8915-385996c4a9d4
dc.subject Class imbalance en_US
dc.subject Imbalanced data en_US
dc.subject KDE en_US
dc.subject Kernel en_US
dc.subject Oversampling en_US
dc.subject Sampling en_US
dc.subject Diagnosis en_US
dc.subject Statistics en_US
dc.subject Classification algorithm en_US
dc.subject Kernel Density Estimation en_US
dc.subject Statistical estimation en_US
dc.subject Theoretical foundations en_US
dc.subject Intrusion detection en_US
dc.title Kernel density estimation based sampling for imbalanced class distribution en_US
dc.type Article en_US
Files
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.01 KB
Format:
Item-specific license agreed upon to submission
Description: