CXR-IRGen: An Integrated Vision and Language Model for the Generation of Clinically Accurate Chest X-Ray Image-Report Pairs

Shentu, Junjie; Al Moubayed, Noura

doi:10.1109/WACV57701.2024.00513

CXR-IRGen: An Integrated Vision and Language Model for the Generation of Clinically Accurate Chest X-Ray Image-Report Pairs

Shentu, Junjie; Al Moubayed, Noura

Authors

Junjie Shentu junjie.shentu@durham.ac.uk
PGR Student Doctor of Philosophy

Dr Noura Al Moubayed noura.al-moubayed@durham.ac.uk
Associate Professor

Abstract

Chest X-Ray (CXR) images play a crucial role in clinical practice, providing vital support for diagnosis and treatment. Augmenting the CXR dataset with synthetically generated CXR images annotated with radiology reports can enhance the performance of deep learning models for various tasks. However, existing studies have primarily focused on generating unimodal data of either images or reports. In this study, we propose an integrated model, CXR-IRGen, designed specifically for generating CXR image-report pairs. Our model follows a modularized structure consisting of a vision module and a language module. Notably, we present a novel prompt design for the vision module by combining both text embedding and image embedding of a reference image. Additionally, we propose a new CXR report generation model as the language module, which effectively leverages a large language model and self-supervised learning strategy. Experimental results demonstrate that our new prompt is capable of improving the general quality (FID) and clinical efficacy (AUROC) of the generated images , with average improvements of 15.84% and 1.84%, respectively. Moreover, the proposed CXR report generation model outperforms baseline models in terms of clinical efficacy (F 1 score) and exhibits a high-level alignment of image and text, as the best F 1 score of our model is 6.93% higher than the state-of-the-art CXR report generation model. Our code is available at https://github.com/junjie-shentu/CXR-IRGen.

Citation

Shentu, J., & Al Moubayed, N. (2024, January). CXR-IRGen: An Integrated Vision and Language Model for the Generation of Clinically Accurate Chest X-Ray Image-Report Pairs. Presented at 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, Hawaii, USA

Presentation Conference Type	Conference Paper (published)
Conference Name	2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
Start Date	Jan 3, 2024
End Date	Jan 8, 2023
Acceptance Date	Nov 1, 2023
Online Publication Date	Apr 9, 2024
Publication Date	Apr 9, 2024
Deposit Date	Nov 23, 2023
Publicly Available Date	Apr 9, 2024
Publisher	Institute of Electrical and Electronics Engineers
Peer Reviewed	Peer Reviewed
Pages	5200-5209
Series ISSN	2472-6737
Book Title	2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
ISBN	9798350318937
DOI	https://doi.org/10.1109/WACV57701.2024.00513
Public URL	https://durham-repository.worktribe.com/output/1948292

Files

Accepted Conference Paper (2.6 Mb)
PDF

Licence
http://creativecommons.org/licenses/by/4.0/

Copyright Statement
This accepted manuscript is licensed under the Creative Commons Attribution 4.0 licence. https://creativecommons.org/licenses/by/4.0/