Skip to main content

Research Repository

Advanced Search

CXR-IRGen: An Integrated Vision and Language Model for the Generation of Clinically Accurate Chest X-Ray Image-Report Pairs

Shentu, Junjie; Al Moubayed, Noura

CXR-IRGen: An Integrated Vision and Language Model for the Generation of Clinically Accurate Chest X-Ray Image-Report Pairs Thumbnail


Authors

Junjie Shentu junjie.shentu@durham.ac.uk
PGR Student Doctor of Philosophy



Abstract

Chest X-Ray (CXR) images play a crucial role in clinical practice, providing vital support for diagnosis and treatment. Augmenting the CXR dataset with synthetically generated CXR images annotated with radiology reports can enhance the performance of deep learning models for various tasks. However, existing studies have primarily focused on generating unimodal data of either images or reports. In this study, we propose an integrated model, CXR-IRGen, designed specifically for generating CXR image-report pairs. Our model follows a modularized structure consisting of a vision module and a language module. Notably, we present a novel prompt design for the vision module by combining both text embedding and image embedding of a reference image. Additionally, we propose a new CXR report generation model as the language module, which effectively leverages a large language model and self-supervised learning strategy. Experimental results demonstrate that our new prompt is capable of improving the general quality (FID) and clinical efficacy (AUROC) of the generated images , with average improvements of 15.84% and 1.84%, respectively. Moreover, the proposed CXR report generation model outperforms baseline models in terms of clinical efficacy (F 1 score) and exhibits a high-level alignment of image and text, as the best F 1 score of our model is 6.93% higher than the state-of-the-art CXR report generation model. Our code is available at https://github.com/junjie-shentu/CXR-IRGen.

Citation

Shentu, J., & Al Moubayed, N. (2024). CXR-IRGen: An Integrated Vision and Language Model for the Generation of Clinically Accurate Chest X-Ray Image-Report Pairs. In 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (5200-5209). https://doi.org/10.1109/WACV57701.2024.00513

Conference Name 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
Conference Location Waikoloa, Hawaii, USA
Start Date Jan 3, 2024
End Date Jan 8, 2023
Acceptance Date Nov 1, 2023
Online Publication Date Apr 9, 2024
Publication Date Apr 9, 2024
Deposit Date Nov 23, 2023
Publicly Available Date Apr 9, 2024
Publisher Institute of Electrical and Electronics Engineers
Pages 5200-5209
Series ISSN 2472-6737
Book Title 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
ISBN 9798350318937
DOI https://doi.org/10.1109/WACV57701.2024.00513
Public URL https://durham-repository.worktribe.com/output/1948292

Files




You might also like



Downloadable Citations