T Winterbottom
On Modality Bias in the TVQA Dataset
Winterbottom, T; Xiao, S; McLean, A; Al Moubayed, N
Abstract
TVQA is a large scale video question answering (video-QA) dataset based on popular TV shows. The questions were specifically designed to require “both vision and language understanding to answer”. In this work, we demonstrate an inherent bias in the dataset towards the textual subtitle modality. We infer said bias both directly and indirectly, notably finding that models trained with subtitles learn, on-average, to suppress video feature contribution. Our results demonstrate that models trained on only the visual information can answer ∼45% of the questions, while using only the subtitles achieves ∼68%. We find that a bilinear pooling based joint representation of modalities damages model performance by 9% implying a reliance on modality specific information. We also show that TVQA fails to benefit from the RUBi modality bias reduction technique popularised in VQA. By simply improving text processing using BERT embeddings with the simple model first proposed for TVQA, we achieve state-of-the-art results (72.13%) compared to the highly complex STAGE model (70.50%). We recommend a multimodal evaluation framework that can highlight biases in models and isolate visual and textual reliant subsets of data. Using this framework we propose subsets of TVQA that respond exclusively to either or both modalities in order to facilitate multimodal modelling as TVQA originally intended.
Citation
Winterbottom, T., Xiao, S., McLean, A., & Al Moubayed, N. (2020, September). On Modality Bias in the TVQA Dataset. Presented at The British Machine Vision Conference (BMVC), Manchester, England
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | The British Machine Vision Conference (BMVC) |
Start Date | Sep 7, 2020 |
End Date | Sep 10, 2020 |
Acceptance Date | Aug 3, 2020 |
Online Publication Date | Aug 25, 2020 |
Publication Date | 2020 |
Deposit Date | Aug 25, 2020 |
Publicly Available Date | Dec 1, 2020 |
Public URL | https://durham-repository.worktribe.com/output/1140271 |
Publisher URL | https://www.bmvc2020-conference.com/programme/accepted-papers/ |
Files
Accepted Conference Proceeding
(771 Kb)
PDF
You might also like
Explainable text-tabular models for predicting mortality risk in companion animals
(2024)
Journal Article
Downloadable Citations
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search