TY - GEN
T1 - Enhanced Gesture Recognition Through Graph-Based Multimodal Fusion
AU - Rehman, Mobeen Ur
AU - Ilyas, Talha
AU - Seneviratne, Lakmal
AU - Hussain, Irfan
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - This study introduces an advanced framework for recognizing hand gestures from a first-person view, leveraging the integration of multimodal data including optical flow, pose, depth, and RGB video recordings. Adeptly navigating the challenges and opportunities presented by the integration of multimodal data. At its core, the framework employs two pivotal components: a cross-attention based adaptive graph convolutional network and relational graph interactions for modality fusion. The former is designed to extract features from skeleton-based gesture data, ensuring a nuanced capture of hand movements by emphasizing the interconnections within the hand's skeletal structure. The latter component innovatively models each output modality feature as a node in a fully connected relational graph, facilitating the fusion of heterogeneous data types through dynamic interactions between modalities. This approach allows for the leveraging of each data type's strengths and the mitigation of their weaknesses, significantly enhancing the system's classification accuracy and robustness. Tested on a public benchmark dataset, the framework achieved a remarkable accuracy of 98.48%, demonstrating its efficacy. Moreover, it proves resilient, maintaining strong performance (93.48% accuracy) even in scenarios where only one modality is available, highlighting its potential for real-world applications. This advancement sets a new benchmark in hand gesture recognition, promising future developments in multimodal data fusion.
AB - This study introduces an advanced framework for recognizing hand gestures from a first-person view, leveraging the integration of multimodal data including optical flow, pose, depth, and RGB video recordings. Adeptly navigating the challenges and opportunities presented by the integration of multimodal data. At its core, the framework employs two pivotal components: a cross-attention based adaptive graph convolutional network and relational graph interactions for modality fusion. The former is designed to extract features from skeleton-based gesture data, ensuring a nuanced capture of hand movements by emphasizing the interconnections within the hand's skeletal structure. The latter component innovatively models each output modality feature as a node in a fully connected relational graph, facilitating the fusion of heterogeneous data types through dynamic interactions between modalities. This approach allows for the leveraging of each data type's strengths and the mitigation of their weaknesses, significantly enhancing the system's classification accuracy and robustness. Tested on a public benchmark dataset, the framework achieved a remarkable accuracy of 98.48%, demonstrating its efficacy. Moreover, it proves resilient, maintaining strong performance (93.48% accuracy) even in scenarios where only one modality is available, highlighting its potential for real-world applications. This advancement sets a new benchmark in hand gesture recognition, promising future developments in multimodal data fusion.
KW - Action Recognition
KW - Cross-Attention Fusion
KW - Graph-Based Multimodal fusion
KW - Relational Graph Interactions
KW - Skeleton-based Action Recognition
UR - https://www.scopus.com/pages/publications/85214922322
U2 - 10.1109/ICSPCC62635.2024.10770517
DO - 10.1109/ICSPCC62635.2024.10770517
M3 - Conference contribution
AN - SCOPUS:85214922322
T3 - 2024 IEEE International Conference on Signal Processing, Communications and Computing, ICSPCC 2024
BT - 2024 IEEE International Conference on Signal Processing, Communications and Computing, ICSPCC 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 14th IEEE International Conference on Signal Processing, Communications and Computing, ICSPCC 2024
Y2 - 19 August 2024 through 22 August 2024
ER -