Abstract:Medical image segmentation is crucial for precision medicine, but existing U-Net-based methods still face challenges such as semantic gaps between encoder layers, low efficiency in multi-scale interaction, and the tendency to lose high-frequency details. To address these issues, this paper proposes a spatial-frequency domain interaction-aware network. First, a cross-layer Fourier difference attention module is designed, which combines joint modelling of frequency domain differences with spatial attention modulation to mitigate semantic gaps between layers and enhance context awareness. Second, we propose a spatial-frequency domain collaborative module that efficiently captures multi-scale contextual information through progressive multi-scale contextual refinement. Based on grouped spectral perception modules, it explicitly enhances low-frequency, medium-frequency, and high-frequency key components to strengthen detail retention, while combining gated fusion for adaptive balancing of dual-domain features. Experiments on two distinct medical image segmentation tasks demonstrate that the model significantly outperforms existing deep learning methods across multiple evaluation metrics, validating its superior performance.