Skip to main content

Research Repository

Advanced Search

Is Unimodal Bias Always Bad for Visual Question Answering? A Medical Domain Study with Dynamic Attention

Sun, Zhongtian; Harit, Anoushka; Cristea, Alexandra I.; Yu, Jialin; Al Moubayed, Noura; Shi, Lei

Is Unimodal Bias Always Bad for Visual Question Answering? A Medical Domain Study with Dynamic Attention Thumbnail


Authors

Zhongtian Sun zhongtian.sun@durham.ac.uk
PGR Student Doctor of Philosophy

Anoushka Harit

Jialin Yu jialin.yu@durham.ac.uk
Academic Visitor

Lei Shi



Abstract

Medical visual question answering (Med-VQA) is to answer medical questions based on clinical images provided. This field is still in its infancy due to the complexity of the trio formed of questions, multimodal features and expert knowledge. In this paper, we tackle, a ’myth’ in the Natural Language Processing area - that unimodal bias is always considered undesirable in learning models. Additionally, we study the effect of integrating a novel dynamic attention mechanism into such models, inspired by a recent graph deep learning study.Unlike traditional attention, dynamic attention scores are conditioned on different query words in a question and thus enhance the representation learning ability of texts. We propose that some questions are answered more accurately with a reinforcement of question embedding after fusing multimodal features. Extensive experiments have been implemented on the VQA-RAD datasets and demonstrate that our proposed model, reinforCe unimOdal dynamiC Attention (COCA), outperforms the state-of-the-art methods overall and performs competitively at open-ended question answering.

Citation

Sun, Z., Harit, A., Cristea, A. I., Yu, J., Al Moubayed, N., & Shi, L. (2022). Is Unimodal Bias Always Bad for Visual Question Answering? A Medical Domain Study with Dynamic Attention. . https://doi.org/10.1109/bigdata55660.2022.10020791

Presentation Conference Type Conference Paper (Published)
Conference Name IEEE Big Data
Start Date Dec 17, 2022
End Date Dec 20, 2022
Acceptance Date Oct 18, 2022
Online Publication Date Jan 26, 2023
Publication Date 2022
Deposit Date Oct 20, 2022
Publicly Available Date Dec 6, 2022
DOI https://doi.org/10.1109/bigdata55660.2022.10020791
Public URL https://durham-repository.worktribe.com/output/1135472

Files

Accepted Conference Proceeding (888 Kb)
PDF

Copyright Statement
Copyright © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.






You might also like



Downloadable Citations