Feature Selection Methods: A Proposed Framework for Analysis and Evaluation

  • Khulood Al Junaibi

Student thesis: Master's Thesis

Abstract

The reduction in the costs of computational power and storage devices has led to the collection of ever increasing amounts of information on a daily basis. Sifting through this information to find only the relevant features is achieved through process known as 'feature selection', and in recent times this has become an extremely popular area of data mining research. As a result, there is a very large variety of feature selection algorithms, some of which perform better than others. The objective of this project is to address the issue of evaluating the performance of different feature selection algorithms. In this context, the most commonly used metric is the accuracy of the classifiers built using the selected feature subsets. We contend that while accuracy is an important performance characteristic, it is not sufficient to simply choose the classifier with the highest accuracy. In this thesis, we present a novel framework which seeks to measure the performance of a feature selector when faced with a number of different operating conditions. Specifically, we identify three different factors which are known to affect performance - these are the levels of noise present in the data, the inclusion of redundant features, and finally the problem of missing data. In addition, we propose the use of consistency as an additional performance metric to be used alongside accuracy. In our experiments, we combine all of these elements to obtain a clearer picture of feature selector performance than would have been possible using conventional means. The results of these experiments are presented in great detail and indicate that different feature selection algorithms are better suited for use under different operating conditions and situations. More importantly, it was also observed that using only the accuracy as an evaluation metric may not provide the whole picture, and that supplementing this with the consistency measure can help to increase the reliability of the feature selection process.
Date of Award2012
Original languageAmerican English
SupervisorWei Lee Woon (Supervisor)

Keywords

  • Data mining
  • Data processing

Cite this

'