In autonomous driving perception, point cloud-based 3D object detection plays an important role. This task still faces two challenges in long-range and small-object detection: loss of fine details and weak context modeling. To solve these problems, this paper proposes HFA-RCNN based on PV-RCNN. The method adds an encoder-decoder structure to the 3D sparse convolution backbone. This design improves multi-scale context modeling and preserves more detailed features. In the BEV feature generation stage, the method also designs a spatial-frequency aggregation network. This network combines complementary information from the spatial domain and the frequency domain. This design improves feature representation. Results on the KITTI dataset show that the proposed method preserves strong detection performance for the Car category and further improves detection accuracy for the Pedestrian and Cyclist categories. These results confirm the effectiveness of the method in long-range and small-object detection.
Zhou Y, Tuzel O, 2018, VoxelNet: End-to-End Learning for Point Cloud based 3D Object Detection, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4490–4499.
Yan Y, Mao Y, Li B, 2018, Second: Sparsely Embedded Convolutional Detection. Sensors, 18(10): 3337.
Qi C, Su H, Mo K, et al., 2017, Pointnet: Deep Learning on Point Sets for 3D Classification and Segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 652–660.
Shi S, Wang X, Li H, 2019, PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 770–779.
Shi S, Guo C, Jiang L, et al., 2020, PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10526–10535.
Geiger A, Lenz P, Urtasun R, 2012, Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3354–3361.
Ku J, Mozifian M, Lee J, et al., 2018, Joint 3D Proposal Generation and Object Detection from View Aggregation, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 1–8.
Lang A, Vora S, Caesar H, et al., 2019, PointPillars: Fast Encoders for Object Detection from Point Clouds, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12689–12697.
Shi S, Wang Z, Shi J, et al., 2020, Part-A^2 Net: 3D Part-Aware and Aggregation Network for Object Detection from Point Cloud, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13338–13347.
Pan X, Xia Z, Song S, et al., 2021, 3D Object Detection with Pointformer, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 7463–7472.