TY - GEN
T1 - Ensemble Approach to Classify Spam SMS from Bengali Text
AU - Al Maruf, Abdullah
AU - Al Numan, Abdullah
AU - Haque, Md Mahmudul
AU - Jidney, Tasmia Tahmida
AU - Aung, Zeyar
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - The Short Message Service (SMS) is a popular communication tool, but it has some security weaknesses, such as the influx of spam messages from cyber criminals. While several studies have been conducted on filtering and categorizing spam messages in various languages, including English, limited research has been done on detecting spam in Bengali (endonym Bangla) text. This study aims to fill this gap by classifying Bengali SMS messages as either spam or ham (legitimate messages). To accomplish this, the study used machine learning algorithms, including support vector machine (SVM) with a linear kernel and decision tree (DT), logistic regression (LR), and random forest (RF) with various parameters, as baseline models. Ensemble approaches, such as bagging, boosting, and stacking, were then used to enhance the performance of the models. The results show that the ensemble approach successfully identified spam messages in Bengali text, with XGBoost producing the most favorable outcome. The contribution of this study lies in its focus on Bengali text and the demonstration of the ensemble method’s performance on a small dataset. The tool developed in this study can provide a secure and efficient SMS service to customers by reducing the burden of spam messages and improving the overall user experience. Additionally, the tool can be marketed as a value-added service for customers who are concerned about the security of their personal and financial information. Overall, this study highlights the importance of machine learning algorithms, specifically ensemble methods, in detecting spam messages in Bengali text and provides a valuable contribution to the field of SMS security.
AB - The Short Message Service (SMS) is a popular communication tool, but it has some security weaknesses, such as the influx of spam messages from cyber criminals. While several studies have been conducted on filtering and categorizing spam messages in various languages, including English, limited research has been done on detecting spam in Bengali (endonym Bangla) text. This study aims to fill this gap by classifying Bengali SMS messages as either spam or ham (legitimate messages). To accomplish this, the study used machine learning algorithms, including support vector machine (SVM) with a linear kernel and decision tree (DT), logistic regression (LR), and random forest (RF) with various parameters, as baseline models. Ensemble approaches, such as bagging, boosting, and stacking, were then used to enhance the performance of the models. The results show that the ensemble approach successfully identified spam messages in Bengali text, with XGBoost producing the most favorable outcome. The contribution of this study lies in its focus on Bengali text and the demonstration of the ensemble method’s performance on a small dataset. The tool developed in this study can provide a secure and efficient SMS service to customers by reducing the burden of spam messages and improving the overall user experience. Additionally, the tool can be marketed as a value-added service for customers who are concerned about the security of their personal and financial information. Overall, this study highlights the importance of machine learning algorithms, specifically ensemble methods, in detecting spam messages in Bengali text and provides a valuable contribution to the field of SMS security.
KW - Bengali Text
KW - Classification
KW - Ensemble
KW - Ensemble Method
KW - Machine Learning
KW - SPAM SMS
UR - http://www.scopus.com/inward/record.url?scp=85172188504&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-37940-6_36
DO - 10.1007/978-3-031-37940-6_36
M3 - Conference contribution
AN - SCOPUS:85172188504
SN - 9783031379390
T3 - Communications in Computer and Information Science
SP - 440
EP - 453
BT - Advances in Computing and Data Sciences - 7th International Conference, ICACDS 2023, Revised Selected Papers
A2 - Singh, Mayank
A2 - Tyagi, Vipin
A2 - Gupta, P.K.
A2 - Flusser, Jan
A2 - Ören, Tuncer
PB - Springer Science and Business Media Deutschland GmbH
T2 - Proceedings of the 7th International Conference on Advances in Computing and Data Sciences, ICACDS 2023
Y2 - 27 April 2023 through 28 April 2023
ER -