TY - GEN
T1 - The Role of Time Delay in Sim2real Transfer of Reinforcement Learning for Unmanned Aerial Vehicles
AU - Elocla, Norhan Mohsen
AU - Chehadeh, Mohamad
AU - Boiko, Igor
AU - Swei, Sean
AU - Zweiri, Yahya
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - This paper investigates the simulation to reality gap in reinforcement learning (RL) applied to Unmanned Aerial Vehicles (UAVs) with fractional delays in the system (i.e., delays which are non-integer multiple of the sampling period). The consideration of delay has a substantial effect on the nature of the UAV system being studied. Systems with the presence of delays are considered non-Markovian, and the system state vector must be extended to make the system Markovian. Based on this analysis, we presented a sampling scheme that yields efficient RL training of agents that perform well in real-world UAVS deployment. We show that the Markovian system-trained agents do not exhibit excessive oscillations, in contrast to the agent that doesn't consider time delay in the training model. Our methodology for robust low-level control of UAV hovering mode has been validated using real-world experiments. Furthermore, real-world experiments show a qualitative match with a simulation which validates the proposed theoretical framework. A video summary of this paper can be watched in https://www.youtube.com/watch?v=1BSAA7usfK0
AB - This paper investigates the simulation to reality gap in reinforcement learning (RL) applied to Unmanned Aerial Vehicles (UAVs) with fractional delays in the system (i.e., delays which are non-integer multiple of the sampling period). The consideration of delay has a substantial effect on the nature of the UAV system being studied. Systems with the presence of delays are considered non-Markovian, and the system state vector must be extended to make the system Markovian. Based on this analysis, we presented a sampling scheme that yields efficient RL training of agents that perform well in real-world UAVS deployment. We show that the Markovian system-trained agents do not exhibit excessive oscillations, in contrast to the agent that doesn't consider time delay in the training model. Our methodology for robust low-level control of UAV hovering mode has been validated using real-world experiments. Furthermore, real-world experiments show a qualitative match with a simulation which validates the proposed theoretical framework. A video summary of this paper can be watched in https://www.youtube.com/watch?v=1BSAA7usfK0
UR - http://www.scopus.com/inward/record.url?scp=85185837266&partnerID=8YFLogxK
U2 - 10.1109/ICAR58858.2023.10406926
DO - 10.1109/ICAR58858.2023.10406926
M3 - Conference contribution
AN - SCOPUS:85185837266
T3 - 2023 21st International Conference on Advanced Robotics, ICAR 2023
SP - 514
EP - 519
BT - 2023 21st International Conference on Advanced Robotics, ICAR 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 21st International Conference on Advanced Robotics, ICAR 2023
Y2 - 5 December 2023 through 8 December 2023
ER -