Is Unimodal Bias Always Bad for Visual Question Answering? A Medical Domain Study with Dynamic Attention

Sun, Zhongtian; Harit, Anoushka; Cristea, Alexandra I.; Yu, Jialin; Al Moubayed, Noura; Shi, Lei

doi:10.1109/bigdata55660.2022.10020791

Is Unimodal Bias Always Bad for Visual Question Answering? A Medical Domain Study with Dynamic Attention

Sun, Zhongtian; Harit, Anoushka; Cristea, Alexandra I.; Yu, Jialin; Al Moubayed, Noura; Shi, Lei

Authors

Zhongtian Sun zhongtian.sun@durham.ac.uk
PGR Student Doctor of Philosophy

Anoushka Harit

Professor Alexandra Cristea alexandra.i.cristea@durham.ac.uk
Professor

Jialin Yu jialin.yu@durham.ac.uk
Academic Visitor

Dr Noura Al Moubayed noura.al-moubayed@durham.ac.uk
Associate Professor

Lei Shi

Abstract

Medical visual question answering (Med-VQA) is to answer medical questions based on clinical images provided. This field is still in its infancy due to the complexity of the trio formed of questions, multimodal features and expert knowledge. In this paper, we tackle, a ’myth’ in the Natural Language Processing area - that unimodal bias is always considered undesirable in learning models. Additionally, we study the effect of integrating a novel dynamic attention mechanism into such models, inspired by a recent graph deep learning study.Unlike traditional attention, dynamic attention scores are conditioned on different query words in a question and thus enhance the representation learning ability of texts. We propose that some questions are answered more accurately with a reinforcement of question embedding after fusing multimodal features. Extensive experiments have been implemented on the VQA-RAD datasets and demonstrate that our proposed model, reinforCe unimOdal dynamiC Attention (COCA), outperforms the state-of-the-art methods overall and performs competitively at open-ended question answering.

Citation

Sun, Z., Harit, A., Cristea, A. I., Yu, J., Al Moubayed, N., & Shi, L. (2022, December). Is Unimodal Bias Always Bad for Visual Question Answering? A Medical Domain Study with Dynamic Attention. Presented at IEEE Big Data, Osaka, Japan

Presentation Conference Type	Conference Paper (published)
Conference Name	IEEE Big Data
Start Date	Dec 17, 2022
End Date	Dec 20, 2022
Acceptance Date	Oct 18, 2022
Online Publication Date	Jan 26, 2023
Publication Date	2022
Deposit Date	Oct 20, 2022
Publicly Available Date	Dec 6, 2022
DOI	https://doi.org/10.1109/bigdata55660.2022.10020791
Public URL	https://durham-repository.worktribe.com/output/1135472

Files

Accepted Conference Proceeding (888 Kb)
PDF

Copyright Statement
Copyright © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Language as a latent sequence: Deep latent variable models for semi-supervised paraphrase generation (2023)
Journal Article

MONEY: Ensemble learning for stock price movement prediction via a convolutional network with adversarial hypergraph model (2023)
Journal Article

Contrastive Learning with Heterogeneous Graph Attention Networks on Short Text Classification (2022)
Presentation / Conference Contribution

A Generative Bayesian Graph Attention Network for Semi-supervised Classification on Scarce Data (2021)
Presentation / Conference Contribution

Analysing Learner Behaviour in an Ontology-Based E-learning System: A Graph Neural Network Approach (2024)
Presentation / Conference Contribution

Downloadable Citations

HTML

BIB

RTF

Authors

Abstract

Citation

Files

You might also like

Downloadable Citations