Abstract
Object segmentation enhances robotic grasping by aiding object identification. Complex environments and dynamic conditions pose challenges such as occlusion, low light conditions, motion blur and object size variance. To address these challenges, we propose a Bimodal SegNet that fuses two types of visual signals, event-based data and RGB frame data. The proposed Bimodal SegNet network has two distinct encoders — one for RGB signal input and another for Event signal input, in addition to an Atrous Pyramidal Feature Amplification module. Encoders capture and fuse the rich contextual information from different resolutions via a Cross-Domain Contextual Attention layer while the decoder obtains sharp object boundaries. The evaluation of the proposed method undertakes five unique image degradation challenges including occlusion, blur, brightness, trajectory and scale variance on the Event-based Segmentation (ESD) Dataset. The results show a 4%–6% MIOU score improvement over state-of-the-art methods in terms of mean intersection over the union and pixel accuracy. The source code, dataset and model are publicly available at: https://github.com/sanket0707/Bimodal-SegNet.
Original language | British English |
---|---|
Article number | 110215 |
Journal | Pattern Recognition |
Volume | 149 |
DOIs | |
State | Published - May 2024 |
Keywords
- Cross attention
- Deep learning
- Event vision
- Grasping
- Robotics