Data imbalance in classification : experimental evaluation

dc.contributor.authorThabtah, Fadi
dc.contributor.authorHammoud, Suhel
dc.contributor.authorKamalov, Firuz
dc.contributor.authorGonsalves, Amanda© 2019
dc.descriptionThis article is not available at CUD collection. The version of scholarly record of this article is published in Information Sciences (2020), available online at:
dc.description.abstractThe advent of Big Data has ushered a new era of scientific breakthroughs. One of the common issues that affects raw data is class imbalance problem which refers to imbalanced distribution of values of the response variable. This issue is present in fraud detection, network intrusion detection, medical diagnostics, and a number of other fields where negatively labeled instances significantly outnumber positively labeled instances. Modern machine learning techniques struggle to deal with imbalanced data by focusing on minimizing the error rate for the majority class while ignoring the minority class. The goal of our paper is demonstrate the effects of class imbalance on classification models. Concretely, we study the impact of varying class imbalance ratios on classifier accuracy. By highlighting the precise nature of the relationship between the degree of class imbalance and the corresponding effects on classifier performance we hope to help researchers to better tackle the problem. To this end, we carry out extensive experiments using 10-fold cross validation on a large number of datasets. In particular, we determine that the relationship between the class imbalance ratio and the accuracy is convex. © 2019 Elsevier Inc.en_US
dc.identifier.citationThabtah, F., Hammoud, S., Kamalov, F., & Gonsalves, A. (2020). Data imbalance in classification: Experimental evaluation. Information Sciences, 513, 429–441.
dc.publisherElsevier Inc.en_US
dc.relationAuthors Affiliations : Thabtah, F., Manukau Institute of Technology, Corner of Manukau Station Road, Davies Ave, Manukau, Auckland 2104, New Zealand; Hammoud, S., University of Kalamoon, Deir Atiyah An-Nabek District Rif Dimashq Governorate in Syria, Syrian Arab Republic; Kamalov, F., Canadian University Dubai, The Interchange, Sheikh Zayed Road, Dubai, United Arab Emirates; Gonsalves, A., Manukau Institute of Technology, Corner of Manukau Station Road, Davies Ave, Manukau, Auckland 2104, New Zealand
dc.relation.ispartofseriesInformation Sciences;Volume 513
dc.rightsLicense to reuse the abstract has been secured from Elsevier and Copyright Clearance Center.
dc.rights.holderCopyright : © 2019 Elsevier Inc.
dc.subjectClass imbalanceen_US
dc.subjectData analysisen_US
dc.subjectMachine learningen_US
dc.subjectStatistical analysisen_US
dc.subjectSupervised learningen_US
dc.subjectData reductionen_US
dc.subjectIntrusion detectionen_US
dc.subjectLarge dataseten_US
dc.subjectLearning systemsen_US
dc.subjectMachine learningen_US
dc.subjectStatistical methodsen_US
dc.subjectSupervised learningen_US
dc.subject10-fold cross-validationen_US
dc.subjectClass imbalanceen_US
dc.subjectClass imbalance problems;en_US
dc.subjectClassification modelsen_US
dc.subjectClassifier performanceen_US
dc.subjectExperimental evaluationen_US
dc.subjectNetwork intrusion detectionen_US
dc.subjectScientific breakthroughen_US
dc.subjectClassification (of information)en_US
dc.titleData imbalance in classification : experimental evaluationen_US
License bundle
Now showing 1 - 1 of 1
Thumbnail Image
2.01 KB
Item-specific license agreed upon to submission