TY - JOUR
T1 - Target localization using Multi-Agent Deep Reinforcement Learning with Proximal Policy Optimization
AU - Alagha, Ahmed
AU - Singh, Shakti
AU - Mizouni, Rabeb
AU - Bentahar, Jamal
AU - Otrok, Hadi
N1 - Funding Information:
The work is supported by the Fonds de Recherche du Québec - Nature et Technologies (FRQNT), the Natural Sciences and Engineering Research Council of Canada (NSERC) , Department of National Defense (Innovation for Defence Excellence and Security (IDEaS)), and Abu Dhabi Award for Research Excellence , ( AARE19-255 ). The extensive simulations in this research were also enabled in part by the support provided through Calcul Quebec ( www.calculquebec.ca ) and Digital Research Alliance of Canada ( www.alliancecan.ca ).
Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2022/11
Y1 - 2022/11
N2 - Target localization refers to identifying a target location based on sensory data readings gathered by sensing agents (robots, UAVs), surveying a certain area of interest. Existing solutions either rely on estimating the target location through fusion and analysis of the collected sensory data, or on pre-defined and data-driven survey paths. However, the adaptability of such methods remains an issue, as increasing the complexity and the dynamicity of the environment requires further re-modeling and supervision. As an efficient and adaptable approach to obtain localization agents, this work proposes several Multi-Agent Deep Reinforcement Learning (MDRL) models to tackle the target localization problem in multi-agent systems. The use of Reinforcement Learning (RL) helps in providing an efficient Artificial Intelligence (AI) paradigm to obtain intelligent agents, which can learn in different complex environments. In this work, an actor–critic structure is used with Convolutional Neural Networks (CNNs), which are optimized using Proximal Policy Optimization (PPO). Agents’ observations are modeled as 2D heatmaps capturing locations and sensor readings of all agents. Cooperation among agents is induced using a team-based reward, which incentivizes agents to cooperate in localizing the target and managing their resources. Scalability with the number of agents is ensured through the use of a Centralized Learning for Decentralized Execution approach, while scalability with the observation size is achieved through image downsampling and Gaussian filters. The efficiency of the proposed models is validated and further benchmarked against existing target localization methods, through experiments on single- and multi-agent systems, for tasks pertaining to radioactive target localization.
AB - Target localization refers to identifying a target location based on sensory data readings gathered by sensing agents (robots, UAVs), surveying a certain area of interest. Existing solutions either rely on estimating the target location through fusion and analysis of the collected sensory data, or on pre-defined and data-driven survey paths. However, the adaptability of such methods remains an issue, as increasing the complexity and the dynamicity of the environment requires further re-modeling and supervision. As an efficient and adaptable approach to obtain localization agents, this work proposes several Multi-Agent Deep Reinforcement Learning (MDRL) models to tackle the target localization problem in multi-agent systems. The use of Reinforcement Learning (RL) helps in providing an efficient Artificial Intelligence (AI) paradigm to obtain intelligent agents, which can learn in different complex environments. In this work, an actor–critic structure is used with Convolutional Neural Networks (CNNs), which are optimized using Proximal Policy Optimization (PPO). Agents’ observations are modeled as 2D heatmaps capturing locations and sensor readings of all agents. Cooperation among agents is induced using a team-based reward, which incentivizes agents to cooperate in localizing the target and managing their resources. Scalability with the number of agents is ensured through the use of a Centralized Learning for Decentralized Execution approach, while scalability with the observation size is achieved through image downsampling and Gaussian filters. The efficiency of the proposed models is validated and further benchmarked against existing target localization methods, through experiments on single- and multi-agent systems, for tasks pertaining to radioactive target localization.
KW - Centralized learning & distributed execution
KW - Convolutional Neural Networks
KW - Joint rewards
KW - Multi-agent deep learning
KW - Proximal Policy Optimization
KW - Target localization
UR - http://www.scopus.com/inward/record.url?scp=85133293788&partnerID=8YFLogxK
U2 - 10.1016/j.future.2022.06.015
DO - 10.1016/j.future.2022.06.015
M3 - Article
AN - SCOPUS:85133293788
SN - 0167-739X
VL - 136
SP - 342
EP - 357
JO - Future Generation Computer Systems
JF - Future Generation Computer Systems
ER -