Robust method of sparse feature selection for multi-label classification with Naive Bayes

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

The explosive growth of big data poses a processing challenge for predictive systems in terms of both data size and its dimensionality. Generating features from text often leads to many thousands of sparse features rarely taking non-zero values. In this work we propose a very fast and robust feature selection method that is optimised with the Naive Bayes classifier. The method takes advantage of the sparse feature representation and uses diversified backward-forward greedy search to arrive with the highly competitive solution at the minimum processing time. It promotes the paradigm of shifting the complexity of predictive systems away from the model algorithm, but towards careful data preprocessing and filtering that allows to accomplish predictive big data tasks on a single processor despite billions of data examples nominally exposed for processing. This method was applied to the AAIA Data Mining Competition 2014 concerned with predicting human injuries as a result of fire incidents based on nearly 12000 risk factors extracted from thousands of fire incident reports and scored the second place with the predictive accuracy of 96%.

Original languageBritish English
Title of host publication2014 Federated Conference on Computer Science and Information Systems, FedCSIS 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages375-380
Number of pages6
ISBN (Electronic)9788360810583
DOIs
StatePublished - 21 Oct 2014
Event2014 Federated Conference on Computer Science and Information Systems, FedCSIS 2014 - Warsaw, Poland
Duration: 7 Sep 201410 Sep 2014

Publication series

Name2014 Federated Conference on Computer Science and Information Systems, FedCSIS 2014

Conference

Conference2014 Federated Conference on Computer Science and Information Systems, FedCSIS 2014
Country/TerritoryPoland
CityWarsaw
Period7/09/1410/09/14

Fingerprint

Dive into the research topics of 'Robust method of sparse feature selection for multi-label classification with Naive Bayes'. Together they form a unique fingerprint.

Cite this