TY - GEN
T1 - Visual-Language Alignment for Background Subtraction
AU - Liu, Jiahe
AU - Zhu, Dandan
AU - Javed, Sajid
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Background Subtraction (BGS) is a fundamental task in video analysis, critical for many application scenarios. Despite the development of various methods to address the identification of moving objects, current techniques fall short when faced with the intricate challenges inherent in real-world settings. Two such challenges that persist are the presence of dynamic backgrounds, where the environmental backdrop is constantly changing, and camera jitter, which introduces erratic movements into the scene. In the field of computer vision, we introduce for the first time a vision-language model designed for BGS tasks, utilizing the integration of linguistic and visual information to enhance the understanding and interpretation of complex scenes within the context of background sub-traction efforts. Our model has been rigorously tested across three categories within the extensive CDNet-2014 dataset, the results indicate a compelling average F-measure of 0.9771, highlighting the model's proficiency. This investigation offers a new perspective and a novel solution for BGS, particularly in complex video scenarios.
AB - Background Subtraction (BGS) is a fundamental task in video analysis, critical for many application scenarios. Despite the development of various methods to address the identification of moving objects, current techniques fall short when faced with the intricate challenges inherent in real-world settings. Two such challenges that persist are the presence of dynamic backgrounds, where the environmental backdrop is constantly changing, and camera jitter, which introduces erratic movements into the scene. In the field of computer vision, we introduce for the first time a vision-language model designed for BGS tasks, utilizing the integration of linguistic and visual information to enhance the understanding and interpretation of complex scenes within the context of background sub-traction efforts. Our model has been rigorously tested across three categories within the extensive CDNet-2014 dataset, the results indicate a compelling average F-measure of 0.9771, highlighting the model's proficiency. This investigation offers a new perspective and a novel solution for BGS, particularly in complex video scenarios.
KW - Background Subtraction
KW - Deep Learning
KW - Vision-Language Model
UR - https://www.scopus.com/pages/publications/85203792948
U2 - 10.1109/ICMEW63481.2024.10645430
DO - 10.1109/ICMEW63481.2024.10645430
M3 - Conference contribution
AN - SCOPUS:85203792948
T3 - 2024 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2024
BT - 2024 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2024
Y2 - 15 July 2024 through 19 July 2024
ER -