Abstract:To address the problems of insufficient feature extraction for small targets, target occlusion, and poor adaptability for edge device deployment in vehicle and pedestrian detection under complex road conditions, this paper proposes a vehicle and pedestrian detection model named AMM-YOLO. First, an Adaptive Dual-branch Attention (ADA) module is designed to dynamically balance channel and spatial feature expressions, enhancing the representation capability of small and occluded targets. Second, a Multi-scale Weighted Concatenation Fusion (MWCF) module is constructed to preserve channel information to the maximum extent and achieve efficient multi-scale semantic fusion. Finally, the Minimum Point Distance Intersection over Union (MPDIoU) loss function is introduced to optimize bounding box localization accuracy and accelerate convergence. Experimental results demonstrate that AMM-YOLO achieves a mAP@0.5 of 63.3% on the SODA10M dataset, representing a 5.2 percentage point improvement over the baseline model, and a mAP@0.5 of 45.2% on the VisDrone2019 dataset, outperforming other mainstream models. After INT8 quantization, the model size is only 3.7 MB with an FPS of 29.5, satisfying lightweight and real-time requirements and providing technical support for intelligent traffic monitoring and autonomous driving.