Machine Learning-based Data Exfiltration Control: From Exfiltration Detection to Privacy Protection Techniques

  • Ghebrebrhan Gebrehans

Student thesis: Master's Thesis

Abstract

In this work, initially, the main emphasis was data exfiltration detection in DNS queries. As such, we proposed an efficient and explainable machine learning-based data exfiltration detection using 2D convolutional neural networks. However, as cybercriminals’ attack tactics and strategies continuously evolve and get more sophisticated over time, multiple attack vectors are employed in facilitating data exfiltration from organizations. One of the most common attack vectors used in data exfiltration or breach is phishing emails or URLs. These phishing emails or URLs are used for malware delivery. Likewise, malicious users also try to use domain generation algorithms (DGAs) to generate a unique and large number of domain names to communicate with infected hosts (i.e., botnets), evade security countermeasures, constantly register for a new domain name on the fly to avoid takedown by law enforcement. These malicious users utilize DGAs to establish command and control (C & C) servers to communicate with the malware in the infected hosts and transfer stolen data. Therefore, in the second part of this thesis work, we presented an alternative solution that concentrates not only on data exfiltration in a network but also deals with attack vectors (i.e., DGAs and Phishing URLs), which are closely tied to data exfiltration. To do so, a character-level attentive convolutional transformer (ACT) is implemented. Lastly, while data exfiltration detection may be an effective method that companies and individuals can use to protect against data theft and data privacy all at once, it may not be the case with the emergence and adoption of new technology, such as ChatGPT and related AI-powered chatbots. Some organizations are utilizing these new AI tools to facilitate their day-to-day work. Hence, a privacy protection method that protects sensitive information from being revealed to external entities is critical. If not handled properly, sharing prompts that carry sensitive information with ChatGPT or other related tools can expose companies (or individuals) to privacy issues as the service providers and their affiliate companies can access their data. Given that companies and individuals want to consume the services offered by AI chatbots, they need an applicable privacy protection mechanism. Therefore, the third part of this work tried to address this problem and proposes an appropriate privacy protection method that can be adopted while using ChatGPT and related tools. The privacy protection approach relies on semantic ontology (taxonomy) to generate privacy-protected prompts and achieve promising results.
Date of AwardApr 2023
Original languageAmerican English
SupervisorERNESTO Damiani (Supervisor)

Keywords

  • ChatGPT
  • Deep Learning
  • Exfiltration Detection
  • DGAs
  • Domain Names
  • Explainable AI
  • Machine Learning
  • Phishing URLs
  • Privacy Protection

Cite this

'