Data imbalance in classification : experimental evaluation
Data imbalance in classification : experimental evaluation
dc.contributor.author | Thabtah, Fadi | |
dc.contributor.author | Hammoud, Suhel | |
dc.contributor.author | Kamalov, Firuz | |
dc.contributor.author | Gonsalves, Amanda | |
dc.date.accessioned | 2021-03-24T05:17:37Z | |
dc.date.available | 2021-03-24T05:17:37Z | |
dc.date.copyright | © 2019 | |
dc.date.issued | 2020-03 | |
dc.description | This article is not available at CUD collection. The version of scholarly record of this article is published in Information Sciences (2020), available online at: https://doi.org/10.1016/j.ins.2019.11.004 | en_US |
dc.description.abstract | The advent of Big Data has ushered a new era of scientific breakthroughs. One of the common issues that affects raw data is class imbalance problem which refers to imbalanced distribution of values of the response variable. This issue is present in fraud detection, network intrusion detection, medical diagnostics, and a number of other fields where negatively labeled instances significantly outnumber positively labeled instances. Modern machine learning techniques struggle to deal with imbalanced data by focusing on minimizing the error rate for the majority class while ignoring the minority class. The goal of our paper is demonstrate the effects of class imbalance on classification models. Concretely, we study the impact of varying class imbalance ratios on classifier accuracy. By highlighting the precise nature of the relationship between the degree of class imbalance and the corresponding effects on classifier performance we hope to help researchers to better tackle the problem. To this end, we carry out extensive experiments using 10-fold cross validation on a large number of datasets. In particular, we determine that the relationship between the class imbalance ratio and the accuracy is convex. © 2019 Elsevier Inc. | en_US |
dc.identifier.citation | Thabtah, F., Hammoud, S., Kamalov, F., & Gonsalves, A. (2020). Data imbalance in classification: Experimental evaluation. Information Sciences, 513, 429–441. https://doi.org/10.1016/j.ins.2019.11.004 | en_US |
dc.identifier.issn | 00200255 | |
dc.identifier.uri | http://dx.doi.org/10.1016/j.ins.2019.11.004 | |
dc.identifier.uri | http://hdl.handle.net/20.500.12519/351 | |
dc.language.iso | en | en_US |
dc.publisher | Elsevier Inc. | en_US |
dc.relation | Authors Affiliations : Thabtah, F., Manukau Institute of Technology, Corner of Manukau Station Road, Davies Ave, Manukau, Auckland 2104, New Zealand; Hammoud, S., University of Kalamoon, Deir Atiyah An-Nabek District Rif Dimashq Governorate in Syria, Syrian Arab Republic; Kamalov, F., Canadian University Dubai, The Interchange, Sheikh Zayed Road, Dubai, United Arab Emirates; Gonsalves, A., Manukau Institute of Technology, Corner of Manukau Station Road, Davies Ave, Manukau, Auckland 2104, New Zealand | |
dc.relation.ispartofseries | Information Sciences;Volume 513 | |
dc.rights | License to reuse the abstract has been secured from Elsevier and Copyright Clearance Center. | |
dc.rights.holder | Copyright : © 2019 Elsevier Inc. | |
dc.rights.uri | https://s100.copyright.com/CustomerAdmin/PLF.jsp?ref=f6e0cee0-5d8c-4198-88df-8aeb57b3c846 | |
dc.subject | Class imbalance | en_US |
dc.subject | Classification | en_US |
dc.subject | Data analysis | en_US |
dc.subject | Machine learning | en_US |
dc.subject | Statistical analysis | en_US |
dc.subject | Supervised learning | en_US |
dc.subject | Data reduction | en_US |
dc.subject | Diagnosis | en_US |
dc.subject | Intrusion detection | en_US |
dc.subject | Large dataset | en_US |
dc.subject | Learning systems | en_US |
dc.subject | Machine learning | en_US |
dc.subject | Statistical methods | en_US |
dc.subject | Supervised learning | en_US |
dc.subject | 10-fold cross-validation | en_US |
dc.subject | Class imbalance | en_US |
dc.subject | Class imbalance problems; | en_US |
dc.subject | Classification models | en_US |
dc.subject | Classifier performance | en_US |
dc.subject | Experimental evaluation | en_US |
dc.subject | Network intrusion detection | en_US |
dc.subject | Scientific breakthrough | en_US |
dc.subject | Classification (of information) | en_US |
dc.title | Data imbalance in classification : experimental evaluation | en_US |
dc.type | Article | en_US |
Files
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 2.01 KB
- Format:
- Item-specific license agreed upon to submission
- Description: