Data imbalance in classification : experimental evaluation

dc.contributor.author Thabtah, Fadi
dc.contributor.author Hammoud, Suhel
dc.contributor.author Kamalov, Firuz
dc.contributor.author Gonsalves, Amanda
dc.date.accessioned 2021-03-24T05:17:37Z
dc.date.available 2021-03-24T05:17:37Z
dc.date.copyright © 2019
dc.date.issued 2020-03
dc.description This article is not available at CUD collection. The version of scholarly record of this article is published in Information Sciences (2020), available online at: https://doi.org/10.1016/j.ins.2019.11.004 en_US
dc.description.abstract The advent of Big Data has ushered a new era of scientific breakthroughs. One of the common issues that affects raw data is class imbalance problem which refers to imbalanced distribution of values of the response variable. This issue is present in fraud detection, network intrusion detection, medical diagnostics, and a number of other fields where negatively labeled instances significantly outnumber positively labeled instances. Modern machine learning techniques struggle to deal with imbalanced data by focusing on minimizing the error rate for the majority class while ignoring the minority class. The goal of our paper is demonstrate the effects of class imbalance on classification models. Concretely, we study the impact of varying class imbalance ratios on classifier accuracy. By highlighting the precise nature of the relationship between the degree of class imbalance and the corresponding effects on classifier performance we hope to help researchers to better tackle the problem. To this end, we carry out extensive experiments using 10-fold cross validation on a large number of datasets. In particular, we determine that the relationship between the class imbalance ratio and the accuracy is convex. © 2019 Elsevier Inc. en_US
dc.identifier.citation Thabtah, F., Hammoud, S., Kamalov, F., & Gonsalves, A. (2020). Data imbalance in classification: Experimental evaluation. Information Sciences, 513, 429–441. https://doi.org/10.1016/j.ins.2019.11.004 en_US
dc.identifier.issn 00200255
dc.identifier.uri http://dx.doi.org/10.1016/j.ins.2019.11.004
dc.identifier.uri http://hdl.handle.net/20.500.12519/351
dc.language.iso en en_US
dc.publisher Elsevier Inc. en_US
dc.relation Authors Affiliations : Thabtah, F., Manukau Institute of Technology, Corner of Manukau Station Road, Davies Ave, Manukau, Auckland 2104, New Zealand; Hammoud, S., University of Kalamoon, Deir Atiyah An-Nabek District Rif Dimashq Governorate in Syria, Syrian Arab Republic; Kamalov, F., Canadian University Dubai, The Interchange, Sheikh Zayed Road, Dubai, United Arab Emirates; Gonsalves, A., Manukau Institute of Technology, Corner of Manukau Station Road, Davies Ave, Manukau, Auckland 2104, New Zealand
dc.relation.ispartofseries Information Sciences;Volume 513
dc.rights License to reuse the abstract has been secured from Elsevier and Copyright Clearance Center.
dc.rights.holder Copyright : © 2019 Elsevier Inc.
dc.rights.uri https://s100.copyright.com/CustomerAdmin/PLF.jsp?ref=f6e0cee0-5d8c-4198-88df-8aeb57b3c846
dc.subject Class imbalance en_US
dc.subject Classification en_US
dc.subject Data analysis en_US
dc.subject Machine learning en_US
dc.subject Statistical analysis en_US
dc.subject Supervised learning en_US
dc.subject Data reduction en_US
dc.subject Diagnosis en_US
dc.subject Intrusion detection en_US
dc.subject Large dataset en_US
dc.subject Learning systems en_US
dc.subject Machine learning en_US
dc.subject Statistical methods en_US
dc.subject Supervised learning en_US
dc.subject 10-fold cross-validation en_US
dc.subject Class imbalance en_US
dc.subject Class imbalance problems; en_US
dc.subject Classification models en_US
dc.subject Classifier performance en_US
dc.subject Experimental evaluation en_US
dc.subject Network intrusion detection en_US
dc.subject Scientific breakthrough en_US
dc.subject Classification (of information) en_US
dc.title Data imbalance in classification : experimental evaluation en_US
dc.type Article en_US
Files
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.01 KB
Format:
Item-specific license agreed upon to submission
Description: