Semi-supervised multi-layered clustering model for intrusion detection

Omar Y. Al-Jarrah, Yousof Al-Hammdi, Paul D. Yoo, Sami Muhaidat, Mahmoud Al-Qutayri

Research output: Contribution to journalArticlepeer-review

43 Scopus citations


A Machine Learning (ML)-based Intrusion Detection and Prevention System (IDPS) requires a large amount of labeled up-to-date training data to effectively detect intrusions and generalize well to novel attacks. However, the labeling of data is costly and becomes infeasible when dealing with big data, such as those generated by Internet of Things applications. To this effect, building an ML model that learns from non-labeled or partially labeled data is of critical importance. This paper proposes a Semi-supervised Multi-Layered Clustering ((SMLC)) model for the detection and prevention of network intrusion. SMLC has the capability to learn from partially labeled data while achieving a detection performance comparable to that of supervised ML-based IDPS. The performance of SMLC is compared with that of a well-known semi-supervised model (tri-training) and of supervised ensemble ML models, namely RandomForest, Bagging, and AdaboostM1 on two benchmark network-intrusion datasets, NSL and Kyoto 2006+. Experimental results show that SMLC is superior to tri-training, providing a comparable detection accuracy with 20% less labeled instances of training data. Furthermore, our results demonstrate that our scheme has a detection accuracy comparable to that of the supervised ensemble models.

Original languageBritish English
Pages (from-to)277-286
Number of pages10
JournalDigital Communications and Networks
Issue number4
StatePublished - Nov 2018


  • Big data
  • Classification
  • Ensembles
  • Machine learning
  • Semi-supervised intrusion detection


Dive into the research topics of 'Semi-supervised multi-layered clustering model for intrusion detection'. Together they form a unique fingerprint.

Cite this