Data imbalance in classification : experimental evaluation

dc.contributor.authorThabtah, Fadi
dc.contributor.authorHammoud, Suhel
dc.contributor.authorKamalov, Firuz
dc.contributor.authorGonsalves, Amanda
dc.date.accessioned2021-03-24T05:17:37Z
dc.date.available2021-03-24T05:17:37Z
dc.date.copyright© 2019
dc.date.issued2020-03
dc.descriptionThis article is not available at CUD collection. The version of scholarly record of this article is published in Information Sciences (2020), available online at: https://doi.org/10.1016/j.ins.2019.11.004en_US
dc.description.abstractThe advent of Big Data has ushered a new era of scientific breakthroughs. One of the common issues that affects raw data is class imbalance problem which refers to imbalanced distribution of values of the response variable. This issue is present in fraud detection, network intrusion detection, medical diagnostics, and a number of other fields where negatively labeled instances significantly outnumber positively labeled instances. Modern machine learning techniques struggle to deal with imbalanced data by focusing on minimizing the error rate for the majority class while ignoring the minority class. The goal of our paper is demonstrate the effects of class imbalance on classification models. Concretely, we study the impact of varying class imbalance ratios on classifier accuracy. By highlighting the precise nature of the relationship between the degree of class imbalance and the corresponding effects on classifier performance we hope to help researchers to better tackle the problem. To this end, we carry out extensive experiments using 10-fold cross validation on a large number of datasets. In particular, we determine that the relationship between the class imbalance ratio and the accuracy is convex. © 2019 Elsevier Inc.en_US
dc.identifier.citationThabtah, F., Hammoud, S., Kamalov, F., & Gonsalves, A. (2020). Data imbalance in classification: Experimental evaluation. Information Sciences, 513, 429–441. https://doi.org/10.1016/j.ins.2019.11.004en_US
dc.identifier.issn00200255
dc.identifier.urihttp://dx.doi.org/10.1016/j.ins.2019.11.004
dc.identifier.urihttp://hdl.handle.net/20.500.12519/351
dc.language.isoenen_US
dc.publisherElsevier Inc.en_US
dc.relationAuthors Affiliations : Thabtah, F., Manukau Institute of Technology, Corner of Manukau Station Road, Davies Ave, Manukau, Auckland 2104, New Zealand; Hammoud, S., University of Kalamoon, Deir Atiyah An-Nabek District Rif Dimashq Governorate in Syria, Syrian Arab Republic; Kamalov, F., Canadian University Dubai, The Interchange, Sheikh Zayed Road, Dubai, United Arab Emirates; Gonsalves, A., Manukau Institute of Technology, Corner of Manukau Station Road, Davies Ave, Manukau, Auckland 2104, New Zealand
dc.relation.ispartofseriesInformation Sciences;Volume 513
dc.rightsLicense to reuse the abstract has been secured from Elsevier and Copyright Clearance Center.
dc.rights.holderCopyright : © 2019 Elsevier Inc.
dc.rights.urihttps://s100.copyright.com/CustomerAdmin/PLF.jsp?ref=f6e0cee0-5d8c-4198-88df-8aeb57b3c846
dc.subjectClass imbalanceen_US
dc.subjectClassificationen_US
dc.subjectData analysisen_US
dc.subjectMachine learningen_US
dc.subjectStatistical analysisen_US
dc.subjectSupervised learningen_US
dc.subjectData reductionen_US
dc.subjectDiagnosisen_US
dc.subjectIntrusion detectionen_US
dc.subjectLarge dataseten_US
dc.subjectLearning systemsen_US
dc.subjectMachine learningen_US
dc.subjectStatistical methodsen_US
dc.subjectSupervised learningen_US
dc.subject10-fold cross-validationen_US
dc.subjectClass imbalanceen_US
dc.subjectClass imbalance problems;en_US
dc.subjectClassification modelsen_US
dc.subjectClassifier performanceen_US
dc.subjectExperimental evaluationen_US
dc.subjectNetwork intrusion detectionen_US
dc.subjectScientific breakthroughen_US
dc.subjectClassification (of information)en_US
dc.titleData imbalance in classification : experimental evaluationen_US
dc.typeArticleen_US
Files
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.01 KB
Format:
Item-specific license agreed upon to submission
Description: