Beating Gradient Boosting: Target-Guided Binning for Massively Scalable Classification in Real-Time

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    Gradient Boosting (GB) consistently outperforms other ML predictors especially in the context of binary classification based on multi-modal data of different forms and types. Its newest efficient implementations, including XGBoost, LGBM and CATBoost, push GB even further ahead with fast GPU-accelerated compute engine and optimized handling of categorical features. In an attempt to beat GB in both the performance and processing speed we propose a new simple yet fast and robust classification model based on predictive binning. At first all features undergo massively parallelized binning into a unified ordinally compressed risk representation, independently optimized to maximize the AUC score against the target. The resultant array of summarized micro-predictors, resembling 0-depth decision trees, directly expressing oridnally represented target risk, are then passed through the greedy feature selection to compose a robust wide-margin voting classifier, whose performance can beat GB while the extreme build and execution speed along with highly compressed representation welcomes extreme data sizes and realtime applicability. The model has been applied to detect cyber-security attacks on IoT devices within FedCSIS'2023 Challenge and scored 2nd place with the AUC ≈ 1, leaving behind all the latest GB variants in performance and speed.

    Original languageBritish English
    Title of host publicationProceedings of the 18th Conference on Computer Science and Intelligence Systems, FedCSIS 2023
    EditorsMaria Ganzha, Leszek Maciaszek, Leszek Maciaszek, Marcin Paprzycki, Dominik Slezak, Dominik Slezak, Dominik Slezak
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages1301-1306
    Number of pages6
    ISBN (Electronic)9788396744784
    DOIs
    StatePublished - 2023
    Event18th Conference on Computer Science and Intelligence Systems, FedCSIS 2023 - Warsaw, Poland
    Duration: 17 Sep 202320 Sep 2023

    Publication series

    NameProceedings of the 18th Conference on Computer Science and Intelligence Systems, FedCSIS 2023

    Conference

    Conference18th Conference on Computer Science and Intelligence Systems, FedCSIS 2023
    Country/TerritoryPoland
    CityWarsaw
    Period17/09/2320/09/23

    Fingerprint

    Dive into the research topics of 'Beating Gradient Boosting: Target-Guided Binning for Massively Scalable Classification in Real-Time'. Together they form a unique fingerprint.

    Cite this