Skip to main navigation Skip to search Skip to main content

TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving

  • Vincenzo Colle
  • , Mohamed Sana
  • , Nicola Piovesan
  • , Antonio De Domenico
  • , Fadhel Ayed
  • , Merouane Debbah
    • University of Cassino and Southern Lazio

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Large Language Models (LLMs) achieve near-human performance on general-purpose benchmarks such as MATH and GSM8K, yet their ability to solve domain-specific numerical problems remains underexplored. In telecommunications, many core tasks - from signal processing to network optimization - require precise quantitative reasoning, where errors directly impact engineering practice. Existing benchmarks evaluate LLMs on general mathematics or telecom-related language tasks, but none assess llM capability to solve numerical mathematical problems in the telecom domain. To address this gap, we introduce TeleMath, the first benchmark dataset specifically designed to evaluate LLM performance in solving mathematical problems with numerical solutions in the telecommunications domain. TeleMath is built through a novel data generation methodology that starts from a compact seed of expert-crafted problems and expands it via problem decomposition, code- and symbolic math-driven blueprint generation, parameterized synthesis, and semantic validation. The resulting benchmark comprises 500 expert-verified question- answer pairs spanning diverse topics in the telecom domain with different levels of difficulty, each requiring a numerical solution. We then conduct a systematic benchmarking study of leading open-source LLMs, comparing reasoning-oriented models against general-purpose ones. On TeleMath, the best reasoning model, Qwen3-32B, achieves 69.51% pass@1 accuracy, substantially outperforming the best non-reasoning baseline, Qwen2.5-Math-72B-Instruct, which achieves 39.99% average accuracy with 2× more parameters, underscoring the unique challenges of telecom mathematics. We release TeleMath and its evaluation framework publicly to foster progress in domain-specific reasoning for telecommunications and beyond.

    Original languageBritish English
    JournalIEEE Network
    DOIs
    StateAccepted/In press - 2026

    Fingerprint

    Dive into the research topics of 'TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving'. Together they form a unique fingerprint.

    Cite this