TY - JOUR
T1 - Conditional variational auto encoder based dynamic motion for multitask imitation learning
AU - Xu, Binzhao
AU - Ud Din, Muhayy
AU - Hussain, Irfan
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/12
Y1 - 2025/12
N2 - The dynamic motion primitive-based (DMP) method is effective for learning from demonstrations. However, most current DMP-based methods focus on learning one task with one module. Although, some deep learning based frameworks can learn multi-task simultaneously. However, these methods require a large amount of training data and have limited generalization of the learned behavior to untrained states. In this paper, we propose a framework that combines the advantages of the traditional DMP-based method and conditional variational auto-encoder (cVAE). The encoder and decoder comprise a dynamic system and a deep neural network. Instead of generating a trajectory directly, deep neural networks are used to generate torque conditioned on the task parameters. This torque is then used to produce the desired trajectory in the dynamic system, based on the final state. In this way, the generated trajectory can adapt to the new goal position, similar to DMP. We also propose a fine-tuning method to guarantee the via-point constraint. Our model is trained and tested on the handwritten digit number dataset and robotic manipulation tasks, such as pushing, reaching, and grasping. Finally, the proposed model is also validated in a real robotic environment with a UR10 manipulator. Compared to traditional data-demanding deep learning-based methods, it is remarkable that our proposed method can achieve a 100% success rate in the reaching task and a 93.33% success rate in pushing and grasping tasks, with only one demonstration provided for each task.
AB - The dynamic motion primitive-based (DMP) method is effective for learning from demonstrations. However, most current DMP-based methods focus on learning one task with one module. Although, some deep learning based frameworks can learn multi-task simultaneously. However, these methods require a large amount of training data and have limited generalization of the learned behavior to untrained states. In this paper, we propose a framework that combines the advantages of the traditional DMP-based method and conditional variational auto-encoder (cVAE). The encoder and decoder comprise a dynamic system and a deep neural network. Instead of generating a trajectory directly, deep neural networks are used to generate torque conditioned on the task parameters. This torque is then used to produce the desired trajectory in the dynamic system, based on the final state. In this way, the generated trajectory can adapt to the new goal position, similar to DMP. We also propose a fine-tuning method to guarantee the via-point constraint. Our model is trained and tested on the handwritten digit number dataset and robotic manipulation tasks, such as pushing, reaching, and grasping. Finally, the proposed model is also validated in a real robotic environment with a UR10 manipulator. Compared to traditional data-demanding deep learning-based methods, it is remarkable that our proposed method can achieve a 100% success rate in the reaching task and a 93.33% success rate in pushing and grasping tasks, with only one demonstration provided for each task.
UR - https://www.scopus.com/pages/publications/105000305338
U2 - 10.1038/s41598-025-93888-4
DO - 10.1038/s41598-025-93888-4
M3 - Article
C2 - 40097597
AN - SCOPUS:105000305338
SN - 2045-2322
VL - 15
JO - Scientific Reports
JF - Scientific Reports
IS - 1
M1 - 9196
ER -