Abstract:Object tracking is often challenging due to factors such as changes in brightness, background interference, and fast motion, especially in complex scenes. Therefore, we propose an object tracking algorithm that combines infrared and visible light fusion with adaptive feature fusion and an attention mechanism to improve tracking performance. By leveraging the complementary strengths of infrared and visible light, we enhance the performance of traditional object tracking algorithms in complex scenes. To achieve this, we first employ an attention mechanism in the initial three convolution layers to select relevant features from both the infrared and visible modalities. Simultaneously, we dynamically allocate weights to the features of different channels, enabling adaptive feature fusion. Subsequently, the features from different channels are fused, and the object is tracked using the instance classification module. Experimental results obtained from the GTOT dataset and RGBT234 dataset demonstrate the effectiveness of our proposed algorithm. The accuracy and success rate achieved are 90.4% and 73.2% on the GTOT dataset, and 79.6% and 56.1% on the RGBT234 dataset, respectively. These results surpass those of current mainstream algorithms.