Traffic Police Gesture Recognition Based on an Improved YOLOv11-Pose Algorithm

Shijie Jia; Haoxiang Zhang

doi:10.26689/jera.v10i2.13662

Download PDF

Keywords

Pose estimation
YOLOv11-pose
Traffic police gesture recognition
C3K2-Star-CAA
Real-time processing

DOI

10.26689/jera.v10i2.13662

Submitted : 2026-03-30

Accepted : 2026-04-14

Published : 2026-04-29

Abstract

To address challenges in feature extraction and real-time processing during traffic police pose estimation, this paper proposes an improved YOLOv11-pose network for traffic police gesture recognition. By replacing the C3K2 module in the backbone network with an enhanced C3K2-Star-CAA module, we achieve efficient extraction of traffic police posture features. A multi-branch star topology enables cross-level feature fusion and multi-scale information propagation, enhancing the model’s perception of minute posture details and complex background interference. Embedding the CAA attention mechanism at the key feature layer models critical locations and their spatial contextual relationships through contextual anchors, effectively enhancing key-point feature representation while suppressing complex background interference. Experimental results demonstrate that the improved model achieves 78.6% mAP on the self-built dataset with a detection speed of 186.9 fps, outperforming comparison models in both accuracy and real-time performance. The findings indicate that this approach provides a robust and highly real-time practical solution for traffic police gesture recognition.

References

Dong H, 2025, A Review of Vision-Based Multi-Task Perception Research Methods for Autonomous Vehicles. Sensors, 25(8): 2611.

Xie Y, 2018, Analysis of Legal Regulations for Autonomous Vehicles in China. Journal of Beijing University of Technology (Social Sciences Edition), 18(6): 72–77.

Cao Z, Simon T, Wei S, et al., 2017, Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1302–1310.

Zheng M, Crouch M, Eggleston M, 2021, Surface Electromyography as a Natural Human-Machine Interface: A Review. IEEE Sensors Journal, 22(10): 1.

Arsalan M, Kim D S, Owais M, et al., 2020, OR-Skip-Net: Outer Residual Skip Network for Skin Segmentation in Non-Ideal Situations. Expert Systems with Applications, 141: 112922.

Hao Z, Dongzhi Z, Bao Z, et al., 2022, Wearable Pressure Sensor Array with Layer-by-Layer Assembled MXene Nanosheets/Ag Nanoflowers for Motion Monitoring and Human-Machine Interfaces. ACS Applied Materials & Interfaces, 14(43): 48907–48916.

Bai H, Li S, Barreiros J, et al., 2020, Stretchable Distributed Fiber-Optic Sensors. Science, 370(6518): 848–852.

Liu M, Hang C, Wu X, et al., 2022, Investigation of Stretchable Strain Sensor Based on CNT/AgNW Applied in Smart Wearable Devices. Nanotechnology, 33(25): 255501.

Zhou Q, Wang S, Wang Y, et al., 2023, Traffic Police Gesture Recognition Based on Faster R-CNN and Fuzzy Matching Algorithm. Advances in Transportation Studies, 2023(60): 159–170.

Xiao J, Li H, Zhao J, 2026, A Lightweight and Efficient Gesture Recognizer for Traffic Police Commands Using Spatiotemporal Feature Fusion. Scientific Reports, 15(2025): 18256.

Xu F, Xu F, Xie J, et al., 2022, Action Recognition Framework in Traffic Scene for Autonomous Driving System. IEEE Transactions on Intelligent Transportation Systems, 23(11): 22301–22311.

Fu Z, Chen J, Jiang K, et al., 2023, Traffic Police 3D Gesture Recognition Based on Spatial-Temporal Fully Adaptive Graph Convolutional Network. IEEE Transactions on Intelligent Transportation Systems, 2023(24): 9518–9531.

Ma N, Wu Z, Feng Y, et al., 2024, Multi-View Time-Series Hypergraph Neural Network for Action Recognition. IEEE Transactions on Image Processing, 2024(33): 3301–3313.

Guo X, Zhu Q, Wang Y, et al., 2024, MG-GCT: A Motion-Guided Graph Convolutional Transformer for Traffic Gesture Recognition. IEEE Transactions on Intelligent Transportation Systems, 2024(25): 14031–14039.

Khanam R, Hussain M, 2024, YOLOv11: An Overview of the Key Architectural Enhancements, arXiv, https://doi.org/10.48550/arXiv.2410.17725

Vaswani A, Shazeer N, Parmar N, et al., 2017, Attention Is All You Need, arXiv, https://doi.org/10.48550/arXiv.2410.17725

Rewrite the Stars, 2024, In: Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5694–5703.

Cai X, Lai Q, Wang Y, et al., 2024, Poly Kernel Inception Network for Remote Sensing Detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

Gevorgyan Z, 2022, SIoU Loss: More Powerful Learning for Bounding Box Regression, arXiv, https://doi.org/10.48550/arXiv.2205.12740

He J, Liao J, Zhang C, et al., 2020, Visual Gesture Recognition Technology Based on Long Short-Term Memory and Deep Neural Networks. Journal of Graphics and Information Technology, 41(3): 372–381.

Lin T, Maire M, Belongie S, et al., 2014, Microsoft COCO: Common Objects in Context. Springer International Publishing.

Andriluka M, Pishchulin L, Gehler P, et al., 2014, Human Pose Estimation: New Benchmark and State of the Art Analysis. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3686–3693.