SGIFormer

Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation

TCSVT, 2024

1The Hong Kong Polytecnic University
2Huazhong University of Science and Technology
†Corresponding author.

Performance of our proposed SGIFormer. (a) We visualize the performance of different methods in terms of AP50 and corresponding model size on ScanNet validation split. SGIFormer. achieves the best performance compared to existing methods, and the even smaller version achieves competitive results. (b) We demonstrate the fine-grained segmentation results of SGIFormer. on ScanNet++ validation set. The proposed method can accurately segment small instances and capture fine-grained details even in large-scale scenes.

Abstract

In recent years, transformer-based models have exhibited considerable potential in point cloud instance segmentation. Despite the promising performance achieved by existing methods, they encounter challenges such as instance query initialization problems and excessive reliance on stacked layers, rendering them incompatible with large-scale 3D scenes. This paper introduces a novel method, named SGIFormer, for 3D instance segmentation, which is composed of the Semantic-guided Mix Query (SMQ) initialization and the Geometric-enhanced Interleaving Transformer (GIT) decoder. Specifically, the principle of our SMQ initialization scheme is to leverage the predicted voxel-wise semantic information to implicitly generate the scene-aware query, yielding adequate scene prior and compensating for the learnable query set. Subsequently, we feed the formed overall query into our GIT decoder to alternately refine instance query and global scene features for further capturing fine-grained information and reducing complex design intricacies simultaneously. To emphasize geometric property, we consider bias estimation as an auxiliary task and progressively integrate shifted point coordinates embedding to reinforce instance localization. SGIFormer attains state-of-the-art performance on ScanNet V2, ScanNet200, S3DIS datasets, and the challenging high-fidelity ScanNet++ benchmark, striking a balance between accuracy and efficiency.


ScanNet++ V2 Instance Segmentation Benchmark

Our SGIFormer achieves state-of-the-art performance on ScanNet++ V2 hidden test set. Results are reported on 22 January 2025.
ScanNet++ V2 Instance Segmentation Benchmark

ScanNet++ Instance Segmentation Benchmark

Our SGIFormer achieves state-of-the-art performance on ScanNet++ V1 hidden test set. Results are reported on 24 June 2024.
ScanNet++ Instance Segmentation Benchmark

ScanNet++ Visualization Results

We visualize several representative large-scale examples of SGIFormer on ScanNet++ validation set.

ScanNet Visualization Results

We compare SGIFormer with SPFormer and Spherical Mask on ScanNet validation set.


Original Image
Modified Image
Original Image
Modified Image
Original Image
Modified Image
Original Image
Modified Image

Acknowledgment

The research work was conducted in the JC STEM Lab of Machine Learning and Computer Vision funded by The Hong Kong Jockey Club Charities Trust.

BibTeX 🙏

@article{yao2024sgiformer,
      title={SGIFormer: Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation},
      author={Yao, Lei and Wang, Yi and Liu, Moyun and Chau, Lap-Pui},
      journal={IEEE Transactions on Circuits and Systems for Video Technology},
      year={2024},
      publisher={IEEE}
    }