Abstract:To address the performance limitations in steel surface defect detection caused by insufficient feature extraction and fusion, as well as susceptibility to background interference, an improved model named HMF-YOLO based on YOLOv8s is proposed. First, HGNetV2 is adopted as the backbone network, and a Multi-Scale Parallel Convolution Module (MSPC) is introduced to enhance feature extraction capability. Second, a Multi-Layer Interactive Fusion Network (MIFN) is constructed as the neck structure to enable efficient interaction and fusion of semantic information across different feature levels, thereby improving the model’s multi-scale representation ability. Finally, a Frequency Domain Adaptive Enhancement Module (FAEM) is designed, which incorporates wavelet transform convolution to effectively learn frequency-domain information from feature maps, enhancing the model’s defect perception capability and robustness. Experimental results on the NEU-DET and GC10-DET datasets show that the mAP of HMF-YOLO reaches 80.3% and 73.6%, achieving improvements of 5.0% and 4.0% over YOLOv8s, respectively, while reducing the number of parameters by 50.5% and computational cost by 35.2%. These results demonstrate that the proposed method significantly improves detection performance while maintaining a lightweight design, meeting the requirements of industrial applications.