Abstract
Many natural processes generate some observations more frequently than others. These processes result in an imbalanced distributions which cause classifiers to bias toward the majority class because most classifiers assume a normal distribution. In order to address the problem of class imbalance, a number of data preprocessing techniques, which can be generally categorized into over-sampling and under-sampling methods, have been proposed throughout the years. The Neighborhood cleaning rule (NCL) method proposed by Laurikkala is among the most popular under-sampling methods. In this paper, we augment the original NCL algorithm by cleaning the unwanted samples using CHC evolutionary algorithm instead of a simple nearest neighborbased cleaning as in NCL. We name our augmented algorithm as NCL+. The performance of NCL+ is compared to that of NCL on 9 imbalanced datasets using 11 different classifiers. Experimental results show noticeable accuracy improvements by NCL+ over NCL. Moreover, NCL+ is also compared to another popular over-sampling method called Synthetic minority over-sampling technique (SMOTE), and is found to offer better results as well.
Original language | British English |
---|---|
Pages (from-to) | 827-834 |
Number of pages | 8 |
Journal | Lecture Notes in Electrical Engineering |
Volume | 339 |
DOIs | |
State | Published - 2015 |
Keywords
- Class Imbalance
- Data Preprocessing
- Evolutionary Algorithm
- Neighborhood Cleaning
- Under-Sampling