Abstract
Object segmentation in cluttered environments is a fundamental pre-processing step for many perception-related tasks such as vision-based robotic grasping. Most of the existing object segmentation methods are incapable of precisely segmenting unknown objects, particularly in scenarios exhibiting significant occlusion. In this paper, we propose a novel approach for refining the segmentation of unknown objects in cluttered scenes. More specifically, a ConvMixer-based UNet model is designed to enhance the segmentation mask and boundary of unknown objects appearing in cluttered scenes. In our model, we lever- age the object's semantic and localization information, which are essential for successful segmentation, using a ConvMixer-based Cross Fusion (CMCF) module. Furthermore, we propose to use patch embedding as a pre-processing step, where input data is rearranged to expedite processing and improve the efficiency of the system. CM-UNet was trained and extensively tested on various challenging publicly available datasets, including unknown objects in un-structured scenes. Thorough evaluations, in terms of segmentation accuracy and processing efficiency, were conducted against state-of-the-art solutions, where the superiority of our model was proven. CM-UNet has shown its ability to efficiently improve the segmentation accuracy of unknown objects in cluttered scenes, even in presence of occlusion.
| Original language | British English |
|---|---|
| Pages (from-to) | 123622-123633 |
| Number of pages | 12 |
| Journal | IEEE Access |
| Volume | 10 |
| DOIs | |
| State | Published - 2022 |
Keywords
- cluttered scene
- ConvMixer-based network
- object segmentation
- robotic grasping
- UNet
- unknown objects