Ensemble Approach to Classify Spam SMS from Bengali Text

Abdullah Al Maruf, Abdullah Al Numan, Md Mahmudul Haque, Tasmia Tahmida Jidney, Zeyar Aung

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    5 Scopus citations

    Abstract

    The Short Message Service (SMS) is a popular communication tool, but it has some security weaknesses, such as the influx of spam messages from cyber criminals. While several studies have been conducted on filtering and categorizing spam messages in various languages, including English, limited research has been done on detecting spam in Bengali (endonym Bangla) text. This study aims to fill this gap by classifying Bengali SMS messages as either spam or ham (legitimate messages). To accomplish this, the study used machine learning algorithms, including support vector machine (SVM) with a linear kernel and decision tree (DT), logistic regression (LR), and random forest (RF) with various parameters, as baseline models. Ensemble approaches, such as bagging, boosting, and stacking, were then used to enhance the performance of the models. The results show that the ensemble approach successfully identified spam messages in Bengali text, with XGBoost producing the most favorable outcome. The contribution of this study lies in its focus on Bengali text and the demonstration of the ensemble method’s performance on a small dataset. The tool developed in this study can provide a secure and efficient SMS service to customers by reducing the burden of spam messages and improving the overall user experience. Additionally, the tool can be marketed as a value-added service for customers who are concerned about the security of their personal and financial information. Overall, this study highlights the importance of machine learning algorithms, specifically ensemble methods, in detecting spam messages in Bengali text and provides a valuable contribution to the field of SMS security.

    Original languageBritish English
    Title of host publicationAdvances in Computing and Data Sciences - 7th International Conference, ICACDS 2023, Revised Selected Papers
    EditorsMayank Singh, Vipin Tyagi, P.K. Gupta, Jan Flusser, Tuncer Ören
    PublisherSpringer Science and Business Media Deutschland GmbH
    Pages440-453
    Number of pages14
    ISBN (Print)9783031379390
    DOIs
    StatePublished - 2023
    EventProceedings of the 7th International Conference on Advances in Computing and Data Sciences, ICACDS 2023 - Kolkata, India
    Duration: 27 Apr 202328 Apr 2023

    Publication series

    NameCommunications in Computer and Information Science
    Volume1848 CCIS
    ISSN (Print)1865-0929
    ISSN (Electronic)1865-0937

    Conference

    ConferenceProceedings of the 7th International Conference on Advances in Computing and Data Sciences, ICACDS 2023
    Country/TerritoryIndia
    CityKolkata
    Period27/04/2328/04/23

    Keywords

    • Bengali Text
    • Classification
    • Ensemble
    • Ensemble Method
    • Machine Learning
    • SPAM SMS

    Fingerprint

    Dive into the research topics of 'Ensemble Approach to Classify Spam SMS from Bengali Text'. Together they form a unique fingerprint.

    Cite this