Natural processes often generate some observations more frequently than others. These processes result in an unbalanced distributions which cause the classifiers to bias toward the majority class especially because most classifiers assume a normal distribution. The quantity and the diversity of imbalanced application domains necessitate and motivate the research community to address the topic of imbalanced dataset classification. Therefore, imbalanced datasets are attracting an incremental attention in the field of classification. In this work, we address the necessity of adapting data pre-processing models in the framework of binary imbalanced datasets, focusing on the synergy with the different cost-sensitive and class imbalance classification algorithms. The results of this empirical study favored the Synthetic Minority Over-sampling Technique (SMOTE) in the case of relativity high Imbalance Ratio (IR) and favored Neighborhood Cleaning Rule (NCL) in the case of relativity small IR. Further improvement was suggested to enhance NCL scalability with IR, and the proposed method is named NCL+. The outcomes showed that NCL+ outperformed NCL especially with the datasets of relatively high IR.
Date of Award | 2014 |
---|
Original language | American English |
---|
Supervisor | U Zeyar Aung (Supervisor) |
---|
- Data processing; Binary Classification; Imbalanced Data.
Handling Class Imbalance Problem in Binary Classification
Alabdouli, N. (Author). 2014
Student thesis: Master's Thesis