基于图卷积与多头注意力的图文跨模态检索
CSTR:
作者:
作者单位:

(1.江南大学 机械工程学院,江苏 无锡 214122;2.江苏省食品先进制造装备技术重点实验室,江苏 无锡 214122;3.江南大学 物联网工程学院,江苏 无锡 214122)

作者简介:

化春键 (1975-),男, 博士,副教授,硕士生导师,主要从事机器视觉、图像处理、深度学习等方面的研究 。

通讯作者:

中图分类号:

TP391

基金项目:

国家自然科学基金(62173160)资助项目


Cross-modal image and text retrieval based on graph convolution and multi-head attention
Author:
Affiliation:

(1.School of Mechanical Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China;2.Jiangsu Key Laboratory of Advanced Food Manufacturing Equipment & Technology, Wuxi, Jiangsu 214122, China;3.School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对现有跨模态检索方法难以衡量各节点数据权重和模态内局部一致性的问题,提出一种基于多头注意力机制的图文跨模态检索方法。首先在构建模态图时,将单个图文样本作为独立节点,采用图卷积提取各样本间的交互信息,提高不同模态数据内的局部一致性;然后在图卷积中引入注意力机制,自适应学习各个邻居节点的权重系数,从而区分不同邻居节点对中心节点的影响力;最后构建带有权重参数的多头注意力层,充分学习节点间的多组相关特征。与现有8种方法相比,该方法在Wikipedia数据集和Pascal Sentence数据集上进行实验得到的mAP值,分别提升了2.6%—42.5%和3.3%—54.3%。

    Abstract:

    Aiming at the problem that the existing cross-modal retrieval methods are difficult to measure the weight of data at each node,and there are limitations in mining local consistency within modalities,a cross-modal image and text retrieval method based on multi-head attention mechanism is proposed.Firstly,a single image and text sample serves as an independent node when constructing the modal diagram,and graph convolution is used to extract the interaction information between each sample to improve the local consistency in different modal data.Then,attention mechanism is introduced into graph convolution to adaptively learn the weight coefficients of each neighboring node,thereby distinguishing the influence of different neighboring nodes on the central node.Finally,a multi-head attention layer with weight parameters is constructed to fully learn multiple sets of related features between nodes.Compared with the existing 8 methods,the mAP values obtained by this method in experiments on the Wikipedia dataset and Pascal Sentence dataset increase by 2.6% to 42.5% and 3.3% to 54.3%,respectively.

    参考文献
    相似文献
    引证文献
引用本文

化春键,张宏图,蒋毅,俞建峰,陈莹.基于图卷积与多头注意力的图文跨模态检索[J].光电子激光,2024,35(9):925~933

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-02-03
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-08-19
  • 出版日期:
文章二维码