SGIFormer: Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation

Abstract

Inspired by recent advances in 2D segmentation, transformer-based models have exhibited considerable potential in point cloud instance segmentation. Despite the promising performance achieved by existing methods, they encounter challenges such as instance query initialization problems and excessive reliance on stacked layers, rendering them incompatible with large-scale 3D scenes. In this paper, we introduce a novel method, named SGIFormer, for 3D instance segmentation, which combines semantic-guided query initialization and geometric-enhanced interleaving transformer. The principle of our query initialization scheme is to leverage the predicted voxel-wise semantic information to implicitly generate the non-parametric query, yielding adequate scene prior and detail retention. Subsequently, by incorporating another learnable query set, we feed the formed overall query into our decoder to alternately refine instance query and global scene features for further capturing fine-grained information and reducing complex design intricacies simultaneously. To emphasize geometric property, we consider bias estimation as an auxiliary task and progressively integrate shifted point coordinates embedding to reinforce instance localization. SGIFormer attains state-of-the-art performance on ScanNet V2, ScanNet200 datasets and challenging high-fidelity ScanNet++ benchmark, striking a balance between accuracy and efficiency.

@article{yao2024SGIFormer, author = {Lei Yao and Yi Wang and Moyun Liu and Lap-Pui Chau}, title = {SGIFormer: Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation}, journal = {arXiv preprint arXiv:2407.11564}, year = {2024}, }

SGIFormer

Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation

Under Review, 2024

Abstract

ScanNet++ Instance Segmentation Benchmark

ScanNet++ Visualization Results

ScanNet Visualization Results

Acknowledgment

BibTeX 🙏

SGIFormer Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation