Text-Guided Multi-Modal Fusion for Underwater Visual Tracking

  • Yonathan Michael
  • , Mohamad Yousif Abdulkareem Alansari
  • , Sajid Javed

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

The integration of Natural Language (NL) descriptions with contemporary tracking algorithms constitutes a new and dynamic field, exhibiting no indications of deceleration in the near future. Nevertheless, the absence of comprehensive language descriptions for tracking datasets, particularly in the domain of underwater tracking datasets, presents a substantial impediment to the advancement of this field. Typically, the textual descriptions accompanying these datasets are brief, inadequately informative, lack details regarding relative location or directional movement, and occasionally deviate from the manner in which a human would naturally describe the target in ordinary conversation. In response to this challenge, we propose the development of vividly descriptive NL descriptions tailored for the UVOT400 dataset, which focuses on underwater tracking. These descriptions aim to encapsulate a myriad of factors in order to furnish as comprehensive an understanding as possible regarding the target fish. Subsequent evaluations of these descriptions, conducted in conjunction with contemporary language-based tracking systems, have revealed superior performance in comparison to the best-performing visual-only trackers employed for benchmarking purposes with the aforementioned dataset.

Original languageBritish English
Title of host publicationAVSS 2024 - 20th IEEE International Conference on Advanced Video and Signal-Based Surveillance
PublisherInstitute of Electrical and Electronics Engineers Inc.
Edition2024
ISBN (Electronic)9798350374285
DOIs
StatePublished - 2024
Event20th IEEE International Conference on Advanced Video and Signal-Based Surveillance, AVSS 2024 - Niagara Falls, Canada
Duration: 15 Jul 202416 Jul 2024

Conference

Conference20th IEEE International Conference on Advanced Video and Signal-Based Surveillance, AVSS 2024
Country/TerritoryCanada
CityNiagara Falls
Period15/07/2416/07/24

Keywords

  • Visual Object Tracking (VOT)
  • Visual-Language Object Tracking (VLOT)

Fingerprint

Dive into the research topics of 'Text-Guided Multi-Modal Fusion for Underwater Visual Tracking'. Together they form a unique fingerprint.

Cite this