Abstract:In order to solve the problems of insufficient dynamic information processing and edge detail capture in colorectal polyp image segmentation, such as boundary information loss and wrong segmentation, this paper proposes a colon polyp segmentation method based on Swin Transformer framework. The method starts with the global context information features and extracts the lesion features step by step using Transformer encoder. Secondly, the improved second-order channel attention mechanism is used to realize the interdependence between higher-order features, enhance the cross-level information interaction ability, and effectively extract rich multi-scale context feature information. Again, the discrete cosine transform (DCT) in the attention mechanism of the reverse frequency channel is used to embed more information in the channel and highlight the channel characteristics of multi-scale context information. Finally, the above multi-scale features are combined with monocular features through the cross-cue aggregation module to enhance the image features from both dynamic and static depth levels, thus improving the dynamic information processing and detail capture capabilities. When tested on the datasets CVC-ClinicDB, Kvasir, CVC-ColonDB, and ETIS-LaribPolypDB, Dice indices were 0.942, 0.924, 0.800, and 0.774, respectively. The MIou index was 0.896, 0.878, 0.726 and 0.697, respectively. The experimental data show that the proposed method can effectively segment colorectal polyp images and provide a new idea for the diagnosis of colorectal polyp.