TY - JOUR
T1 - Multiagent Deep Reinforcement Learning With Demonstration Cloning for Target Localization
AU - Alagha, Ahmed
AU - Mizouni, Rabeb
AU - Bentahar, Jamal
AU - Otrok, Hadi
AU - Singh, Shakti
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2023/8/1
Y1 - 2023/8/1
N2 - In target localization applications, readings from multiple sensing agents are processed to identify a target location. The localization systems using stationary sensors use data fusion methods to estimate the target location, whereas other systems use mobile sensing agents (UAVs, robots) to search the area for the target. However, such methods are designed for specific environments, and hence are deemed infeasible if the environment changes. For instance, the presence of walls increases the environment's complexity and affects the collected readings and the mobility of the agents. Recent works explored deep reinforcement learning (DRL) as an efficient and adaptable approach to tackle the target search problem. However, such methods are either designed for single-agent systems or for noncomplex environments. This work proposes two novel multiagent DRL models for target localization through search in complex environments. The first model utilizes proximal policy optimization, convolutional neural networks, Convolutional AutoEncoders to create embeddings, and a shaped reward function using breadth first search to obtain cooperative agents that achieve fast localization at low cost. The second model improves the first model in terms of computational complexity by replacing the shaped reward with a simple sparse reward, subject to the availability of Expert Demonstrations. Expert demonstrations are used in Demonstration Cloning, a novel method that utilizes demonstrations to guide the learning of new agents. The proposed models are tested on a scenario of radioactive target localization, and benchmarked with existing methods, showing efficacy in terms of localization time and cost, in addition to learning speed and stability.
AB - In target localization applications, readings from multiple sensing agents are processed to identify a target location. The localization systems using stationary sensors use data fusion methods to estimate the target location, whereas other systems use mobile sensing agents (UAVs, robots) to search the area for the target. However, such methods are designed for specific environments, and hence are deemed infeasible if the environment changes. For instance, the presence of walls increases the environment's complexity and affects the collected readings and the mobility of the agents. Recent works explored deep reinforcement learning (DRL) as an efficient and adaptable approach to tackle the target search problem. However, such methods are either designed for single-agent systems or for noncomplex environments. This work proposes two novel multiagent DRL models for target localization through search in complex environments. The first model utilizes proximal policy optimization, convolutional neural networks, Convolutional AutoEncoders to create embeddings, and a shaped reward function using breadth first search to obtain cooperative agents that achieve fast localization at low cost. The second model improves the first model in terms of computational complexity by replacing the shaped reward with a simple sparse reward, subject to the availability of Expert Demonstrations. Expert demonstrations are used in Demonstration Cloning, a novel method that utilizes demonstrations to guide the learning of new agents. The proposed models are tested on a scenario of radioactive target localization, and benchmarked with existing methods, showing efficacy in terms of localization time and cost, in addition to learning speed and stability.
KW - Imitation learning (IL)
KW - multiagent deep reinforcement learning (MDRL)
KW - proximal policy optimization (PPO)
KW - reward shaping
KW - target localization
UR - http://www.scopus.com/inward/record.url?scp=85151553407&partnerID=8YFLogxK
U2 - 10.1109/JIOT.2023.3262663
DO - 10.1109/JIOT.2023.3262663
M3 - Article
AN - SCOPUS:85151553407
SN - 2327-4662
VL - 10
SP - 13556
EP - 13570
JO - IEEE Internet of Things Journal
JF - IEEE Internet of Things Journal
IS - 15
ER -