Reward shaping in DRL: A novel framework for adaptive resource management in dynamic environments

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

In edge computing environments, efficient computation resource management is crucial for optimizing service allocation to hosts in the form of containers. These environments experience dynamic user demands and high mobility, making traditional static and heuristic-based methods inadequate for handling such complexity and variability. Deep Reinforcement Learning (DRL) offers a more adaptable solution, capable of responding to these dynamic conditions. However, existing DRL methods face challenges such as high reward variability, slow convergence, and difficulties in incorporating user mobility and rapidly changing environmental configurations. To overcome these challenges, we propose a novel DRL framework for computation resource optimization at the edge layer. This framework leverages a customized Markov Decision Process (MDP) and Proximal Policy Optimization (PPO), integrating a Graph Convolutional Transformer (GCT). By combining Graph Convolutional Networks (GCN) with Transformer encoders, the GCT introduces a spatio-temporal reward-shaping mechanism that enhances the agent's ability to select hosts and assign services efficiently in real time while minimizing the overload. Our approach significantly enhances the speed and accuracy of resource allocation, achieving, on average across two datasets, a 30% reduction in convergence time, a 25% increase in total accumulated rewards, and a 35% improvement in service allocation efficiency compared to standard DRL methods and existing reward-shaping techniques. Our method was validated using two real-world datasets, MOBILE DATA CHALLENGE (MDC) and Shanghai Telecom, and was compared against standard DRL models, reward-shaping baselines, and heuristic methods.

Original languageBritish English
Article number122238
JournalInformation Sciences
Volume715
DOIs
StatePublished - Oct 2025

Keywords

  • Dynamic environment
  • Fast convergence
  • On-demand
  • Reinforcement learning
  • Resource management
  • Reward shaping

Fingerprint

Dive into the research topics of 'Reward shaping in DRL: A novel framework for adaptive resource management in dynamic environments'. Together they form a unique fingerprint.

Cite this