Aiming at the problems faced by construction site video management in the recognition of cigarette butts, reflective vests, and other objects, such as small target confusion, high-brightness false alarms, occlusion missed detections, and poor adaptability to complex environments, this study proposes a recognition accuracy optimization algorithm based on multimodal fusion. The research constructs a dataset containing three modalities of data: visible light, infrared, and millimeter-wave. The Dust-GAN algorithm is adopted to realize dust removal and enhancement of dusty images, and the SAA module is introduced into YOLOv8-s to improve the small target recall rate. Meanwhile, three-modal feature fusion is achieved, and channel pruning and quantization-aware training are used to realize algorithm lightweighting. The algorithm was deployed and operated on-site for 3 months, effectively reducing the construction site safety accident rate by 65%, which provides a solution for safety management and control in smart construction sites under complex environments.
Jiang X, Wang B, Xia Y, et al., 2022, Smoking Behavior Detection Based on Human Key Points and YOLOv4. Journal of Shaanxi Normal University (Natural Science Edition), 50(3): 96–103.
Wang D, Bai C, Wu K, 2021, Review of Video Object Detection Based on Deep Learning. Journal of Computer Science and Exploration, 2021: 1–15.
Varghese R, Sambath M, 2024, YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), 2024.
Chen S, Ma H, Wang T, et al., 2022, Video Sentiment Analysis Technology Based on Multimodal Fusion. Journal of Chengdu University of Information Technology, 2022(6): 656–661.
Guo N, Jiang L, 2021, Processing of Multimodal Video Captions Based on Hard Attention Mechanism. Application Research of Computers, 38(3): 956–960.
Pan W, Wei C, Qian C, et al., 2024, Improved YOLOv8s Model for Small Object Detection from UAV Perspective. Computer Engineering and Applications, 60(9): 142–150.
Wang Y, Li M, Sun H, 2024, External Knowledge-Based VQA Integrating Cross-Modal Transformer. Science Technology and Engineering, 24(20): 8577–8586.
He Y, Zhang X, Sun J, 2017, Channel Pruning for Accelerating Very Deep Neural Networks. Proceedings of the IEEE International Conference on Computer Vision, 2017: 1389–1397.
Li H, Wang L, Zhang J, 2023, Research on Multimodal Data Acquisition and Synchronization System for Smart Construction Sites. Automation & Instrumentation, 2023(8): 145–149.
Zhao Y, Wang T, Li T, 2021, Image Rendering and Data Augmentation Technology for Reflective Vests in Complex Lighting Environments. Journal of Graphics, 42(5): 825–832.
Chen J, 2019, Design and Research of Dust Concentration Detection Based on Image Method, thesis, China Jiliang University.
Zhang L, Tian Y, 2024, Multi-Scale Lightweight Vehicle Object Detection Algorithm Based on Improved YOLOv8. Computer Engineering and Applications, 60(3): 129–137.
Yue M, Shu K, Zhang C, et al., 2024, Research on Infrared Small Target Detection Algorithm Based on Improved YOLOv8. Infrared Technology, 2024(11): 1286–1292.
Ju R, Chien C, Chiang J, 2024, YOLOv8-ResCBAM: YOLOv8 Based on an Effective Attention Module for Pediatric Wrist Fracture Detection. arXiv. https://doi.org/10.48550/arXiv.2409.18826
Wang M, Yao G, Yang Y, et al., 2023, Deep Learning-Based Object Detection for Visible Dust and Prevention Measures on Construction Sites. Developments in the Built Environment, 2023: 16.