SGIFormer

Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation

Under Review, 2024

1The Hong Kong Polytecnic University
2Huazhong University of Science and Technology
†Corresponding author.

Performance of our proposed SGIFormer. (a) We visualize the performance of different methods in terms of AP50 and corresponding model size on ScanNet validation split. SGIFormer. achieves the best performance compared to existing methods, and the even smaller version achieves competitive results. (b) We demonstrate the fine-grained segmentation results of SGIFormer. on ScanNet++ validation set. The proposed method can accurately segment small instances and capture fine-grained details even in large-scale scenes.

Abstract

Inspired by recent advances in 2D segmentation, transformer-based models have exhibited considerable potential in point cloud instance segmentation. Despite the promising performance achieved by existing methods, they encounter challenges such as instance query initialization problems and excessive reliance on stacked layers, rendering them incompatible with large-scale 3D scenes. In this paper, we introduce a novel method, named SGIFormer, for 3D instance segmentation, which combines semantic-guided query initialization and geometric-enhanced interleaving transformer. The principle of our query initialization scheme is to leverage the predicted voxel-wise semantic information to implicitly generate the non-parametric query, yielding adequate scene prior and detail retention. Subsequently, by incorporating another learnable query set, we feed the formed overall query into our decoder to alternately refine instance query and global scene features for further capturing fine-grained information and reducing complex design intricacies simultaneously. To emphasize geometric property, we consider bias estimation as an auxiliary task and progressively integrate shifted point coordinates embedding to reinforce instance localization. SGIFormer attains state-of-the-art performance on ScanNet V2, ScanNet200 datasets and challenging high-fidelity ScanNet++ benchmark, striking a balance between accuracy and efficiency.


ScanNet++ Instance Segmentation Benchmark

Our SGIFormer achieves state-of-the-art performance on ScanNet++ hidden test set. Results are reported on 24 June 2024.
ScanNet++ Instance Segmentation Benchmark

ScanNet++ Visualization Results

We visualize several representative large-scale examples of SGIFormer on ScanNet++ validation set.

ScanNet Visualization Results

We compare SGIFormer with SPFormer and Spherical Mask on ScanNet validation set.


Original Image
Modified Image
Original Image
Modified Image
Original Image
Modified Image
Original Image
Modified Image

Acknowledgment

The research work was conducted in the JC STEM Lab of Machine Learning and Computer Vision funded by The Hong Kong Jockey Club Charities Trust.

BibTeX 🙏

@article{yao2024SGIFormer,
  author    = {Lei Yao and Yi Wang and Moyun Liu and Lap-Pui Chau},
  title     = {SGIFormer: Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation},
  journal   = {arXiv preprint arXiv:2407.11564},
  year      = {2024},
}