Junjie Shentu junjie.shentu@durham.ac.uk
Intern (Casual)
CXR-IRGen: An Integrated Vision and Language Model for the Generation of Clinically Accurate Chest X-Ray Image-Report Pairs
Shentu, Junjie; Al Moubayed, Noura
Authors
Dr Noura Al Moubayed noura.al-moubayed@durham.ac.uk
Associate Professor
Abstract
Chest X-Ray (CXR) images play a crucial role in clinical practice, providing vital support for diagnosis and treatment. Augmenting the CXR dataset with synthetically generated CXR images annotated with radiology reports can enhance the performance of deep learning models for various tasks. However, existing studies have primarily focused on generating unimodal data of either images or reports. In this study, we propose an integrated model, CXR-IRGen, designed specifically for generating CXR image-report pairs. Our model follows a modularized structure consisting of a vision module and a language module. Notably, we present a novel prompt design for the vision module by combining both text embedding and image embedding of a reference image. Additionally, we propose a new CXR report generation model as the language module, which effectively leverages a large language model and self-supervised learning strategy. Experimental results demonstrate that our new prompt is capable of improving the general quality (FID) and clinical efficacy (AUROC) of the generated images , with average improvements of 15.84% and 1.84%, respectively. Moreover, the proposed CXR report generation model outperforms baseline models in terms of clinical efficacy (F 1 score) and exhibits a high-level alignment of image and text, as the best F 1 score of our model is 6.93% higher than the state-of-the-art CXR report generation model. Our code is available at https://github.com/junjie-shentu/CXR-IRGen.
Citation
Shentu, J., & Al Moubayed, N. (2024). CXR-IRGen: An Integrated Vision and Language Model for the Generation of Clinically Accurate Chest X-Ray Image-Report Pairs. In 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (5200-5209). https://doi.org/10.1109/WACV57701.2024.00513
Presentation Conference Type | Conference Paper (Published) |
---|---|
Conference Name | 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) |
Start Date | Jan 3, 2024 |
End Date | Jan 8, 2023 |
Acceptance Date | Nov 1, 2023 |
Online Publication Date | Apr 9, 2024 |
Publication Date | Apr 9, 2024 |
Deposit Date | Nov 23, 2023 |
Publicly Available Date | Apr 9, 2024 |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 5200-5209 |
Series ISSN | 2472-6737 |
Book Title | 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) |
ISBN | 9798350318937 |
DOI | https://doi.org/10.1109/WACV57701.2024.00513 |
Public URL | https://durham-repository.worktribe.com/output/1948292 |
Files
Accepted Conference Paper
(2.6 Mb)
PDF
Licence
http://creativecommons.org/licenses/by/4.0/
Copyright Statement
This accepted manuscript is licensed under the Creative Commons Attribution 4.0 licence. https://creativecommons.org/licenses/by/4.0/
You might also like
Is Unimodal Bias Always Bad for Visual Question Answering? A Medical Domain Study with Dynamic Attention
(2022)
Presentation / Conference Contribution
Towards Graph Representation Learning Based Surgical Workflow Anticipation
(2022)
Presentation / Conference Contribution
Efficient Uncertainty Quantification for Multilabel Text Classification
(2022)
Presentation / Conference Contribution
Contrastive Learning with Heterogeneous Graph Attention Networks on Short Text Classification
(2022)
Presentation / Conference Contribution
INTERACTION: A Generative XAI Framework for Natural Language Inference Explanations
(2022)
Presentation / Conference Contribution
Downloadable Citations
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search