Multi-Modal UAV Inspection of Photovoltaic Modules Using a YOLOv9-Based Fusion Network
Abstract
The rapid expansion of photovoltaic (PV) power plants has created a pressing need for efficient and reliable operation and maintenance (O&M). Traditional manual inspection is slow, costly, and prone to error, motivating the use of unmanned aerial vehicles (UAVs) with infrared and visible cameras for automated monitoring. In this paper, we propose a YOLO-based multi-task framework for simultaneous PV defect detection and hazard-level classification. We constructed a dataset of 5,000 annotated UAV images from the Riyue PV power plant, covering ten defect categories and four severity levels (LV1–LV4). To support severity grading, the YOLO architecture was extended with a dual-task head and an ordinal regression scheme. The model was trained with a compound loss combining bounding-box regression, objectness, defect classification, and hazard-level supervision. Experimental evaluation on real UAV inspection data (224 strings, 30 ground-truth defects) shows that the proposed approach achieves mAP50 of 95.6%, recall of 92.7%, and severity classification accuracy of 90.8%. The system detects both minor anomalies (e.g., bird droppings, soiling) and critical faults (e.g., missing panels, disconnections) in real time at over 40 FPS, providing actionable insights for maintenance prioritization. These results demonstrate that YOLO-based UAV inspection offers a robust and scalable solution for intelligent PV O&M.
References
Girshick R, 2015, Fast R-CNN, Proceedings of the IEEE International Conference on Computer Vision, 1440–1448.
Liu W, Anguelov D, Erhan D, et al., 2016, SSD: Single Shot Multibox Detector, European Conference on Computer Vision, 21–37, Springer.
Lin TY, Goyal P, Girshick R, et al., 2017, Focal Loss for Dense Object Detection, Proceedings of the IEEE International Conference on Computer Vision, 2980–2988.
Wang CY, Bochkovskiy A, Liao HYM, 2023, YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7464–7475.
Varghese R, Sambath M, 2024, YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness, 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), 1–6, IEEE.
Wang CY, Yeh IH, Liao HYM, 2024, YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information, European Conference on Computer Vision, 1–21, Springer.
Liu J, Zhang S, Wang S, et al., 2016, Multispectral Deep Neural Networks for Pedestrian Detection. arXiv. https://doi.org/10.48550/arXiv.1611.02644
Takumi K, Watanabe K, Ha Q, et al., 2017, Multispectral Object Detection for Autonomous Vehicles, Proceedings of the Thematic Workshops of ACM Multimedia 2017, 35–43.
Zhang L, Liu Z, Zhang S, et al., 2019, Cross-Modality Interactive Attention Network for Multispectral Pedestrian Detection. Information Fusion, 50: 20–29.
Xie Y, Zhang L, Yu X, et al., 2023, YOLO-MS: Multispectral Object Detection Via Feature Interaction and Self-Attention Guided Fusion. IEEE Transactions on Cognitive and Developmental Systems, 15(4): 2132–2143.
Zheng Y, Izzat IH, Ziaee S, 2019, GFD-SSD: Gated Fusion Double SSD for Multispectral Pedestrian Detection. arXiv. https://doi.org/10.48550/arXiv.1903.06999