Abstract:Aiming at the challenges of small object detection in unmanned aerial vehicle (UAV) aerial images, including large-scale variations, weak feature representations, complex background interference, and insufficient localization accuracy, a multi-scale feature enhancement and boundary-aware network is proposed. First, a Multi-Scale Spatial Attention Augmentation Module (MSAAM) is designed to guide the network to adaptively focus on small object regions and improve feature discriminability. Second, a Multi-Scale Spatial Fusion Enhancement Module (MSFEM) is proposed, which employs parallel multi-scale branches and an efficient channel attention mechanism to enhance multi-scale feature representation. Finally, the Inner-IoU (Intersection over Union) loss function is introduced to strengthen the perception of cohesive regions within small object boundaries and improve localization accuracy. Experimental results on the VisDrone2019 dataset demonstrate that the proposed MSFE-ILNet (Multi-Scale Feature Enhancement and Inner-IoU Learning Network) achieves precision, recall, and mAP@50 of 56.5%, 44.4%, and 47.3%, respectively, significantly outperforming existing mainstream detection models. Further evaluation on the RSOD dataset validates the good generalization capability of the proposed method, providing an effective solution for small object detection in UAV aerial photography.