TY - JOUR
T1 - Reinforcement Learning Framework for Server Placement and Workload Allocation in Multiaccess Edge Computing
AU - Mazloomi, Anahita
AU - Sami, Hani
AU - Bentahar, Jamal
AU - Otrok, Hadi
AU - Mourad, Azzam
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2023/1/15
Y1 - 2023/1/15
N2 - Cloud computing is a reliable solution to provide distributed computation power. However, real-time response is still challenging regarding the enormous amount of data generated by the IoT devices in 5G and 6G networks. Thus, multiaccess edge computing (MEC), which consists of distributing the edge servers in the proximity of end users to have low latency besides the higher processing power, is increasingly becoming a vital factor for the success of modern applications. This article addresses the problem of minimizing both, the network delay, which is the main objective of MEC, and the number of edge servers to provide a MEC design with minimum cost. This MEC design consists of edge servers placement and base stations allocation, which makes it a joint combinatorial optimization problem (COP). Recently, reinforcement learning (RL) has shown promising results for COPs. However, modeling real-world problems using RL when the state and action spaces are large still needs investigation. We propose a novel RL framework with an efficient representation and modeling of the state space, action space, and the penalty function in the design of the underlying Markov decision process (MDP) for solving our problem. This modeling makes the temporal difference (TD) learning applicable for a large-scale real-world problem while minimizing the cost of network design. We introduce the TD (λ) with eligibility traces for minimizing the cost (TDMC) algorithm, in addition to Q-learning for the same problem (QMC) when λ =0. Furthermore, we discuss the impact of state representation, action space, and penalty function on the convergence of each model. Extensive experiments using real-world data sets from Shanghai Telecommunication and Citywide Public Computer Centers demonstrate that in the light of an efficient model, TDMC/QMC are able to find the actions that are the source of lower delayed penalty. The reported results show that our algorithm outperforms the other benchmarks by creating a tradeoff among multiple objectives.
AB - Cloud computing is a reliable solution to provide distributed computation power. However, real-time response is still challenging regarding the enormous amount of data generated by the IoT devices in 5G and 6G networks. Thus, multiaccess edge computing (MEC), which consists of distributing the edge servers in the proximity of end users to have low latency besides the higher processing power, is increasingly becoming a vital factor for the success of modern applications. This article addresses the problem of minimizing both, the network delay, which is the main objective of MEC, and the number of edge servers to provide a MEC design with minimum cost. This MEC design consists of edge servers placement and base stations allocation, which makes it a joint combinatorial optimization problem (COP). Recently, reinforcement learning (RL) has shown promising results for COPs. However, modeling real-world problems using RL when the state and action spaces are large still needs investigation. We propose a novel RL framework with an efficient representation and modeling of the state space, action space, and the penalty function in the design of the underlying Markov decision process (MDP) for solving our problem. This modeling makes the temporal difference (TD) learning applicable for a large-scale real-world problem while minimizing the cost of network design. We introduce the TD (λ) with eligibility traces for minimizing the cost (TDMC) algorithm, in addition to Q-learning for the same problem (QMC) when λ =0. Furthermore, we discuss the impact of state representation, action space, and penalty function on the convergence of each model. Extensive experiments using real-world data sets from Shanghai Telecommunication and Citywide Public Computer Centers demonstrate that in the light of an efficient model, TDMC/QMC are able to find the actions that are the source of lower delayed penalty. The reported results show that our algorithm outperforms the other benchmarks by creating a tradeoff among multiple objectives.
KW - Base station allocation
KW - edge server placement
KW - multiaccess edge computing (MEC)
KW - Q-learning
KW - reinforcement learning (RL)
KW - TD(λ)
UR - http://www.scopus.com/inward/record.url?scp=85137861359&partnerID=8YFLogxK
U2 - 10.1109/JIOT.2022.3205051
DO - 10.1109/JIOT.2022.3205051
M3 - Article
AN - SCOPUS:85137861359
SN - 2327-4662
VL - 10
SP - 1376
EP - 1390
JO - IEEE Internet of Things Journal
JF - IEEE Internet of Things Journal
IS - 2
ER -