SecureQwen: Leveraging LLMs for vulnerability detection in python codebases

Abdechakour Mechri, Mohamed Amine Ferrag, Merouane Debbah

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

Identifying vulnerabilities in software code is crucial for ensuring the security of modern systems. However, manual detection requires expert knowledge and is time-consuming, underscoring the need for automated techniques. In this paper, we present SecureQwen, a novel vulnerability detection tool leveraging large language models (LLMs) with a context length of 64K tokens to identify potential security threats in large-scale Python codebases. Utilizing a decoder-only transformer architecture, SecureQwen captures complex relationships between code tokens, enabling accurate classification of vulnerable code sequences across 14 common weakness enumerations (CWEs), including OS Command Injection, SQL Injection, Improper Check or Handling of Exceptional Conditions, Path Traversal, Broken or Risky Cryptographic Algorithm, Deserialization of Untrusted Data, and Cleartext Transmission of Sensitive Information. Therefore, we evaluate SecureQwen on a large Python dataset with over 1.875 million function-level code snippets from different sources, including GitHub repositories, Codeparrot's dataset, and synthetic data generated by GPT4-o. The experimental evaluation demonstrates high accuracy, with F1 scores ranging from 84% to 99%. The results indicate that SecureQwen effectively detects vulnerabilities in human-written and AI-generated code.

Original languageBritish English
Article number104151
JournalComputers and Security
Volume148
DOIs
StatePublished - Jan 2025

Keywords

  • Codebase
  • Generative pre-trained transformers
  • Large language model
  • Security
  • Software security
  • Static analysis
  • Vulnerability detection

Fingerprint

Dive into the research topics of 'SecureQwen: Leveraging LLMs for vulnerability detection in python codebases'. Together they form a unique fingerprint.

Cite this