Panoramic Glass Image Segmentation Network
Abstract
In panoramic images, the geometric distortion caused by wide-angle lenses makes traditional semantic segmentation methods difficult to accurately segment the glass areas. To address the challenges of capturing spatial features and integrating context information, we propose the Panoramic Glass Image Segmentation Network (PGISNet). This network integrates the Matrix Decomposition Base Module (MDBM), the Transparent Perception Consistency Module (TACM), the Context and Texture Compensation Module (CTCM), and the Multi-scale Gated Context Attention Module (MGCA), constructing a progressive feature processing flow. Experimental results on the PanoGlassV2 benchmark test show that PGISNet achieved 90.03% IoU, 94.76% F-score, and 94.0% PA, significantly outperforming existing methods, verifying its effectiveness and advancement in the panoramic image glass segmentation task.
References
Guo M, Lu C, Hou Q, et al., 2022, Rethinking Convolutional Attention Design for Semantic Segmentation. ArXiv. https://doi.org/10.48550/arXiv.2209.08575
Chang Q, Meng X, Hong Z, et al., 2024, ProgressiveGlassNet: Glass Detection with Progressive Decoder. In: 2024 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA), 917–925.
Huo D, Wang J, Qian Y, et al., 2023, Glass Segmentation with RGB-Thermal Image Pairs. IEEE Trans Image Process, 2023(32): 1911–1926.
Xie E, Wang W, Wang W, et al., 2021, Segmenting Transparent Object in the Wild with Transformer. ArXiv. https://doi.org/10.48550/arXiv.2101.08461
Xie E, Wang W, Wang W, et al., 2020, Segmenting Transparent Objects in the Wild. ArXiv. https://doi.org/10.48550/arXiv.2003.13948
Zhao H, Shi J, Qi X, et al., 2017, Pyramid Scene Parsing Network. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6230–6239.
Chang Q, Liao H, Meng X, et al., 2024, PanoglassNet: Glass Detection with Panoramic RGB and Intensity Images. IEEE Trans Instrum Meas, 2024(99): 1.
Dosovitskiy A, Beyer L, Kolesnikov A, 2020, An Image is Worth 16x16 Words; Transformers for Image Recognition at Scale. ArXiv. https://doi.org/10.48550/arXiv.2010.11929
Yu C, Gao C, Wang J, 2020, BiSeNet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation. ArXiv. https://doi.org/10.48550/arXiv.2004.02147
Fan M, Lai S, Huang J, et al., 2021, Rethinking BiSeNet for Real-Time Semantic Segmentation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9711–9720.
Zhang H, Wu C, Zhang Z, 2022, Split-Attention Networks. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2735–2745.
Huang Z, Wang X, Huang L, et al., 2019, CCNet: Criss-Cross Attention for Semantic Segmentation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 603–612.
Chu X, Tian Z, Wang Y, 2021, Twins: Revisiting the Design of Spatial Attention in Vision Transformers. Advances in Neural Information Processing Systems (NeurIPS 2021), 9355–9366.
Liu Z, Lin Y, Cao Y, 2021, Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 9992–10002.
Liu Z, Mao H, Wu C, 2022, A ConvNet for the 2020s. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11966–11976.
Teng Z, Zhang J, Yang K, 2022, 360BEV: Panoramic Semantic Mapping for Indoor Bird’s Eye View. ArXiv. https://doi.org/10.48550/arXiv.2303.11910
Zhang J, Yang K, Ma C, 2022, Bending Reality: Distortion-Aware Transformers for Adapting to Panoramic Semantic Segmentation. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 16917–16927.