Designing Anti-spam Detection by Using Locality Sensitive Hash (LSH)

  • Alyaa Al Mazrouei

Student thesis: Master's Thesis


E-mail is a critical communication medium used around the world. Cyber crimes are defined as crimes committed over the Internet to targeted users using different tools and methods. Sending spam emails is considered a major type of cyber-crime. Various goals and motivation lead to sending spam emails. For instance, spammers send malicious emails for marketing purposes, to spread malware and viruses, or for the purpose of phishing attacks. In this research, the spam problem is discussed; what it is, why it exists and how to mitigate the risk? How spammers can launch a phishing campaign and how they can capture users' passwords is also covered. Machine learning is one of best solutions to identify and filter spam messages, defending organizations against spam problems. A locality-sensitive hash (LSH) is one of the best solutions to represent spam messages, containing log size and allowing for similarity comparison of hashed emails. This project discusses classification of spam emails using confusion matrix, which can be later used to define machine-learning models. Many research approaches have been proposed in this area. In this project the aim is to use minimum information to analyze emails and classify them as spam or non-spam by applying LSH to obtain a compact representation, computing the histograms of the LSH occurrences and finding a discrimination threshold using the confusion matrix. Experimental results confirm the applicability of our approach.
Date of AwardMay 2019
Original languageAmerican English


  • Spam
  • Locality Sensitve Hash
  • Confusion Matrix
  • Phishnig
  • Email
  • Machine Learning.

Cite this