Towards Next Generation Large-Scale Network-Based Intrusion Detection Systems: A Machine Learning Perspective

  • Omar Al-Jarrah

Student thesis: Doctoral Thesis


Emerging technologies such as the Internet of Things (IoT), along with disappearing network boundaries, and sophisticated attacks, have elevated the risk of network intrusions.An Intrusion Detection System (IDS) is a critical component of the networks' infrastructure defense mechanism as it monitors and analyzes the network traffic to detect potential malicious activity. Recently, Machine Learning (ML) models have been adopted for IDSs, due to their ability to learn from seen examples and their exibility to generalize to unseen ones. However, as the size of the data increases, the computational cost of conventional ML models increases as well: the big data dilemma. Therefore, not only the detection accuracy, but also the corresponding efficiency and scalability of the IDS is important.The purpose of the present thesis is to design and develop efficient and scalable ML models for large-scale IDSs. In more detail, this thesis 1) reviews the theoretical background of ML and IDSs and states the shortcomings of existing IDSs (Chapters 2{3); and 2) presents the design and development of a botnet detection model to deal with large volumes of network data with lower computational cost, while maintaining its detection accuracy. In particular, we propose i) a detection model that detects botnet attacks based on network-traffic flow characteristics, regardless of the packets' contents or payload; and ii) novel data condensation methods based on a modified forward selection ranking technique and a data reduction technique that utilizes Voronoi-based data partitioning (Chapter 4). Furthermore, this thesis 3) presents the design and development of the Multi-Layered Clustering Model (MLCM) and Semi-supervised Multi-Layered Clustering (SMLC) for network intrusion detection tasks (Chapter 5). We demonstrate that the proposed models mitigate the deployment issues of existing ML-based IDSs as they can achieve comparable performance with partially labeled data. Importantly, our framework can serve as a basis towards the development of next-generation ML-based IDSs where the efficiency and scalability are the primary objectives. Indexing Terms: Intrusion detection, machine learning, ensemble Models, multi-layered clustering, semi-supervised learning.
Date of AwardSep 2016
Original languageAmerican English
SupervisorSami Muhaidat (Supervisor)


  • Intrusion detection
  • machine learning
  • ensemble Models
  • multi-layered clustering
  • semi-supervised learning.

Cite this