Data Collection Probe with Applications State Identifier for ML Based Exfiltration Detection

  • Fatema Maasmi

Student thesis: Master's Thesis


Smartphones are the most widespread compared to other devices such as personal computers or laptops. Their uses evolved from just communication to multiple personal uses such as online shopping, banking, education, photography, social media, etc. Unfortunately, there are many cyber security attacks on any device connected to the network, which comes in different forms, and one of the most common attacks is through malware. Nowadays, many malware types exist with different behaviors, such as targeting the user's confidential and private data, in other words, data theft or exfiltration. Many research works have contributed to detecting malware or their behavior using different techniques; most of them used machine learning techniques to develop their solution. However, a machine learning model can still produce errors. This research focuses on enhancing the accuracy of a malware behavior detection model by using another machine learning model that derives context information. This information is going to be passed to the malware behavior detector model forming a cascading model. The proposed model in this research uses system calls collected using a data collection probe developed for this project and encoded them into monogram histograms as a representation of system calls invocation frequency for every 20-seconds time window. The model is responsible for providing us with the following information: 1. Identification of the Application through monogram histogram. 2. Identifying the state of applications (foreground/background). 3. And whether the user has interacted with the application in case the application is in the foreground state. The research has shown that monograms or 1-grams histograms are efficient enough to identify applications with their respective state, as long as the data collection process has captured all the possible user interactions with the application's user interface. On the other hand, in this research, the two models were not technically fused as a cascading model; however, the research has illustrated the correlation between the false negatives and the application's state by using conditional probability. The model produced false-negative (classifying malign as benign) when the malware is in foreground state with user interaction, and when the application is in the background, their probabilities given their state are around 60%. This shows that the state of an application is potentially helpful context information to the malware behavior model to limit the false-negative rate since malware can disguise itself when running in the background and when having user interaction by intercepting user's inputs and utilizing that moment for malicious activities like exfiltration.
Date of AwardDec 2021
Original languageAmerican English


  • Android OS
  • System Calls
  • Process
  • Machine Learning
  • Malware
  • Exfiltration.

Cite this