SpecBEV-IR: Illumination-Robust Front-End Enhancement for Multi-View BEV 3D Object Detection
Download PDF

Keywords

Multi-view BEV detection
Illumination-robust enhancement
Invariant cue extraction
3D object detection

DOI

10.26689/jera.v10i4.14906

Submitted : 2026-04-21
Accepted : 2026-05-06
Published : 2026-05-21

Abstract

Multi-view visual BEV 3D object detection projects image information from different camera views into a unified bird’s-eye-view space and has become an important paradigm for autonomous driving perception due to its low cost, flexible deployment, and rich semantic information. However, under complex lighting conditions such as nighttime, backlighting, local overexposure, and uneven illumination, multi-view input images often suffer from degraded brightness distribution, local contrast, and structural details, which further affects image feature extraction, view transformation, and unified spatial modeling. To address this issue, this paper proposes SpecBEV-IR, an illumination-robust multi-view BEV 3D object detection method. Built upon the SpecBEV framework, the proposed method introduces an illumination-robust image front-end enhancement module, termed ICF, between the multi-view input images and the shared 2D encoder. The ICF module consists of an invariant cue extraction unit (ICE) and a fusion convolution unit (Fuse Conv). ICE extracts more stable illumination-invariant cues from raw images, while Fuse Conv integrates these cues with the original image content to generate enhanced input representations for subsequent feature encoding and view transformation. Different from conventional enhancement methods that mainly improve visual appearance, SpecBEV-IR emphasizes structural stability and cross-view consistency for downstream 3D detection. Experiments on the nuScenes dataset show that SpecBEV-IR achieves 0.4121 mAP and 0.5174 NDS on the validation set, while also obtaining better or more balanced performance on multiple error metrics, including mATE, mASE, mAOE, and mAAE. The results demonstrate that the proposed method effectively improves the overall robustness and detection performance of multi-view visual 3D object detection under complex lighting conditions.

References

Li H, Sima C, Dai J, et al., 2023, Delving into the Devils of Bird’s-Eye-View Perception: A Review, Evaluation and Recipe. IEEE Transactions on Pattern Analysis and Machine Intelligence.

Ma Y, Wang T, Bai X, et al., 2024, Vision-Centric BEV Perception: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12): 10978–10997.

Philion J, Fidler S, 2020, Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D, Computer Vision – ECCV 2020, 194–210.

Huang J, Huang G, Zhu Z, et al., 2021, BEVDet: High-Performance Multi-Camera 3D Object Detection in Bird-Eye-View, arXiv preprint arXiv:2112.11790.

Li Y, Ge Z, Yu G, et al., 2023, BEVDepth: Acquisition of Reliable Depth for Multi-View 3D Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 37(2): 1477–1485.

Li Z, Wang W, Li H, et al., 2022, BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatio Temporal Transformers, European Conference on Computer Vision, 1–18.

Zhu Z, Zhang Y, Chen H, et al., 2023, Understanding the Robustness of 3D object Detection with Bird’s-Eye-View Representations in Autonomous Driving, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 21600–21610.

Wang S, Zhao X, Xu H, et al., 2023, Towards Domain Generalization for Multi-View 3D Object Detection in Bird-Eye-View, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13333–13342.

Guo C, Li C, Guo J, et al., 2020, Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1780–1789.

Hashmi K, Suresh K, Stricker D, et al., 2025, TorchAdapt: Towards Light-Agnostic Real-Time Visual Perception, Proceedings of the IEEE/CVF International Conference on Computer Vision, 5645–5656.

Wang S, Zeng S, Gu T, et al., 2025, From Enhancement to Understanding: Build a Generalized Bridge for Low-Light Vision via Semantically Consistent Unsupervised Fine-Tuning, Proceedings of the IEEE/CVF International Conference on Computer Vision, 13804–13814.

Li H, Zhao Y, Zhong J, et al., 2025, Delving into the Secrets of BEV 3D Object Detection in Autonomous Driving: A Comprehensive Survey. IEEE Transactions on Intelligent Transportation Systems, 27(1): 119–144.

Huang J, Huang G, 2022, BEVDet4D: Exploit Temporal Cues in Multi-Camera 3D Object Detection, arXiv preprint arXiv:2203.17054.

Liu Y, Wang T, Zhang X, et al., 2022, PETR: Position Embedding Transformation for Multi-View 3D Object Detection, European Conference on Computer Vision, 531–548.

Wang Y, Guizilini V, Zhang T, et al., 2022, DETR3D: 3D Object Detection from Multi-View Images via 3D-to-2D Queries, Conference on Robot Learning, 180–191.

Li Y, Huang B, Chen Z, et al., 2024, Fast-BEV: A Fast and Strong Bird’s-Eye-View Perception Baseline, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024.

Ma L, Ma T, Liu R, et al., 2022, Toward Fast, Flexible, and Robust Low-Light Image Enhancement, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5637–5646.

Yang S, Ding M, Wu Y, et al., 2023, Implicit Neural Representation for Cooperative Low-Light Image Enhancement, Proceedings of the IEEE/CVF International Conference on Computer Vision, 12918–12927.

Zhang Y, Zhang J, Guo X, 2019, Kindling the Darkness: A Practical Low-Light Image Enhancer, Proceedings of the 27th ACM International Conference on Multimedia, 1632–1640.

Wei C, Wang W, Yang W, et al., 2018, Deep Retinex Decomposition for Low-Light Enhancement, arXiv preprint arXiv:1808.04560.

Liu R, Ma L, Zhang J, et al., 2021, Retinex-Inspired Unrolling with Cooperative Prior Architecture Search for Low-Light Image Enhancement, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10561–10570.

Li C, Guo C, Loy C, 2021, Learning to Enhance Low-Light Image via Zero-Reference Deep Curve Estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(8): 4225–4238.

Hashmi K, Kallempudi G, Stricker D, et al., 2023, FeatEnhancer: Enhancing Hierarchical Features for Object Detection and Beyond under Low-Light Vision, Proceedings of the IEEE/CVF International Conference on Computer Vision, 6725–6735.

Caesar H, Bankiti V, Lang A, et al., 2020, nuScenes: A Multimodal Dataset for Autonomous Driving, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11621–11631.

Li J, Feng X, Hua Z, 2021, Low-Light Image Enhancement via Progressive-Recursive Network. IEEE Transactions on Circuits and Systems for Video Technology, 31(11): 4227–4240.