Zhongtian Sun zhongtian.sun@durham.ac.uk
PGR Student Doctor of Philosophy
Is Unimodal Bias Always Bad for Visual Question Answering? A Medical Domain Study with Dynamic Attention
Sun, Zhongtian; Harit, Anoushka; Cristea, Alexandra I.; Yu, Jialin; Al Moubayed, Noura; Shi, Lei
Authors
Anoushka Harit
Professor Alexandra Cristea alexandra.i.cristea@durham.ac.uk
Professor
Jialin Yu jialin.yu@durham.ac.uk
Academic Visitor
Dr Noura Al Moubayed noura.al-moubayed@durham.ac.uk
Associate Professor
Lei Shi
Abstract
Medical visual question answering (Med-VQA) is to answer medical questions based on clinical images provided. This field is still in its infancy due to the complexity of the trio formed of questions, multimodal features and expert knowledge. In this paper, we tackle, a ’myth’ in the Natural Language Processing area - that unimodal bias is always considered undesirable in learning models. Additionally, we study the effect of integrating a novel dynamic attention mechanism into such models, inspired by a recent graph deep learning study.Unlike traditional attention, dynamic attention scores are conditioned on different query words in a question and thus enhance the representation learning ability of texts. We propose that some questions are answered more accurately with a reinforcement of question embedding after fusing multimodal features. Extensive experiments have been implemented on the VQA-RAD datasets and demonstrate that our proposed model, reinforCe unimOdal dynamiC Attention (COCA), outperforms the state-of-the-art methods overall and performs competitively at open-ended question answering.
Citation
Sun, Z., Harit, A., Cristea, A. I., Yu, J., Al Moubayed, N., & Shi, L. (2022, December). Is Unimodal Bias Always Bad for Visual Question Answering? A Medical Domain Study with Dynamic Attention. Presented at IEEE Big Data, Osaka, Japan
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | IEEE Big Data |
Start Date | Dec 17, 2022 |
End Date | Dec 20, 2022 |
Acceptance Date | Oct 18, 2022 |
Online Publication Date | Jan 26, 2023 |
Publication Date | 2022 |
Deposit Date | Oct 20, 2022 |
Publicly Available Date | Dec 6, 2022 |
DOI | https://doi.org/10.1109/bigdata55660.2022.10020791 |
Public URL | https://durham-repository.worktribe.com/output/1135472 |
Files
Accepted Conference Proceeding
(888 Kb)
PDF
Copyright Statement
Copyright © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
You might also like
Analysing Learner Behaviour in an Ontology-Based E-learning System: A Graph Neural Network Approach
(2024)
Presentation / Conference Contribution
Contrastive Learning with Heterogeneous Graph Attention Networks on Short Text Classification
(2022)
Presentation / Conference Contribution
A Generative Bayesian Graph Attention Network for Semi-supervised Classification on Scarce Data
(2021)
Presentation / Conference Contribution
Downloadable Citations
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search