Unaligned 2D to 3D Translation with Conditional Vector-Quantized Code Diffusion using Transformers

Corona-Figueroa, Abril; Bond-Taylor, Sam; Bhowmik, Neelanjan; Gaus, Yona Falinie A.; Breckon, Toby P.; Shum, Hubert P.H.; Willcocks, Chris G.

doi:10.1109/ICCV51070.2023.01341

Unaligned 2D to 3D Translation with Conditional Vector-Quantized Code Diffusion using Transformers

Corona-Figueroa, Abril; Bond-Taylor, Sam; Bhowmik, Neelanjan; Gaus, Yona Falinie A.; Breckon, Toby P.; Shum, Hubert P.H.; Willcocks, Chris G.

Authors

Abril Corona Figueroa abril.corona-figueroa@durham.ac.uk
PGR Student Doctor of Philosophy

Samuel Bond-Taylor samuel.e.bond-taylor@durham.ac.uk
PGR Student Doctor of Philosophy

Dr Neelanjan Bhowmik neelanjan.bhowmik@durham.ac.uk
Post Doctoral Research Associate

Yona Binti Abd Gaus yona.f.binti-abd-gaus@durham.ac.uk
Post Doctoral Research Associate

Professor Toby Breckon toby.breckon@durham.ac.uk
Professor

Professor Hubert Shum hubert.shum@durham.ac.uk
Professor

Dr Chris Willcocks christopher.g.willcocks@durham.ac.uk
Associate Professor

Abstract

Generating 3D images of complex objects conditionally from a few 2D views is a difficult synthesis problem, compounded by issues such as domain gap and geometric misalignment. For instance, a unified framework such as Generative Adversarial Networks cannot achieve this unless they explicitly define both a domain-invariant and geometric-invariant joint latent distribution, whereas Neural
Radiance Fields are generally unable to handle both issues as they optimize at the pixel level. By contrast, we propose a simple and novel 2D to 3D synthesis approach based on conditional diffusion with vector-quantized codes. Operating in an information-rich code space enables highresolution 3D synthesis via full-coverage attention across the views. Specifically, we generate the 3D codes, e.g. for CT images, conditional on previously generated 3D codes and the entire codebook of two 2D views (e.g. 2D X-rays). Qualitative and quantitative results demonstrate state-of-the-art performance over specialized methods across varied evaluation criteria, including fidelity metrics such as density and coverage and distortion metrics for two datasets of complex volumetric imagery found in real-world scenarios.

Citation

Corona-Figueroa, A., Bond-Taylor, S., Bhowmik, N., Gaus, Y. F. A., Breckon, T. P., Shum, H. P., & Willcocks, C. G. (2023, October). Unaligned 2D to 3D Translation with Conditional Vector-Quantized Code Diffusion using Transformers. Presented at ICCV23: 2023 IEEE/CVF International Conference on Computer Vision, Paris, France

Presentation Conference Type	Conference Paper (published)
Conference Name	ICCV23: 2023 IEEE/CVF International Conference on Computer Vision
Start Date	Oct 2, 2023
End Date	Oct 6, 2023
Acceptance Date	Jul 14, 2023
Online Publication Date	Jan 15, 2024
Publication Date	2023
Deposit Date	Aug 30, 2023
Publicly Available Date	Dec 31, 2023
Publisher	Institute of Electrical and Electronics Engineers
Series ISSN	1550-5499
Book Title	ICCV '23: Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision
ISBN	9798350307191
DOI	https://doi.org/10.1109/ICCV51070.2023.01341
Public URL	https://durham-repository.worktribe.com/output/1726461
Publisher URL	https://ieeexplore.ieee.org/xpl/conhome/1000149/all-proceedings

Files

Accepted Conference Paper (3.8 Mb)
PDF

Copyright Statement
© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.