Samuel Bond-Taylor samuel.e.bond-taylor@durham.ac.uk
PGR Student Doctor of Philosophy
Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes
Bond-Taylor, S.E.; Hessey, P.; Sasaki, H.; Breckon, T.P.; Willcocks, C.G.
Authors
P. Hessey
H. Sasaki
Professor Toby Breckon toby.breckon@durham.ac.uk
Professor
Dr Chris Willcocks christopher.g.willcocks@durham.ac.uk
Associate Professor
Abstract
Whilst diffusion probabilistic models can generate high quality image content, key limitations remain in terms of both generating high-resolution imagery and their associated high computational requirements. Recent Vector-Quantized image models have overcome this limitation of image resolution but are prohibitively slow and unidirectional as they generate tokens via element-wise autoregressive sampling from the prior. By contrast, in this paper we propose a novel discrete diffusion probabilistic model prior which enables parallel prediction of Vector-Quantized tokens by using an unconstrained Transformer architecture as the backbone. During training, tokens are randomly masked in an order-agnostic manner and the Transformer learns to predict the original tokens. This parallelism of Vector-Quantized token prediction in turn facilitates unconditional generation of globally consistent high-resolution and diverse imagery at a fraction of the computational expense. In this manner, we can generate image resolutions exceeding that of the original training set samples whilst additionally provisioning per-image likelihood estimates (in a departure from generative adversarial approaches). Our approach achieves state-of-the-art results in terms of the manifold overlap metrics Coverage (LSUN Bedroom: 0.83; LSUN Churches: 0.73; FFHQ: 0.80) and Density (LSUN Bedroom: 1.51; LSUN Churches: 1.12; FFHQ: 1.20), and performs competitively on FID (LSUN Bedroom: 3.27; LSUN Churches: 4.07; FFHQ: 6.11) whilst offering advantages in terms of both computation and reduced training set requirements.
Citation
Bond-Taylor, S., Hessey, P., Sasaki, H., Breckon, T., & Willcocks, C. (2022, October). Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes. Presented at ECCV 2022: European Conference on Computer Vision, Tel Aviv, Israel
Presentation Conference Type | Conference Paper (Published) |
---|---|
Conference Name | ECCV 2022: European Conference on Computer Vision |
Start Date | Oct 23, 2022 |
End Date | Oct 27, 2022 |
Acceptance Date | Jul 8, 2022 |
Online Publication Date | Oct 28, 2022 |
Publication Date | 2022-10 |
Deposit Date | Oct 12, 2022 |
Publicly Available Date | Oct 29, 2023 |
Print ISSN | 0302-9743 |
Publisher | Springer Verlag |
Volume | 13683 |
Pages | 170-188 |
Series Title | Lecture Notes in Computer Science |
Book Title | ECCV 2022: Computer Vision – ECCV 2022 |
Public URL | https://durham-repository.worktribe.com/output/1135659 |
Publisher URL | https://eccv2022.ecva.net/ |
Files
Accepted Conference Proceeding
(24 Mb)
PDF
Copyright Statement
The final authenticated version is available online at https://doi.org/10.1007/978-3-031-20050-2_11
You might also like
Unaligned 2D to 3D Translation with Conditional Vector-Quantized Code Diffusion using Transformers
(2023)
Presentation / Conference Contribution
MedNeRF: Medical Neural Radiance Fields for Reconstructing 3D-aware CT-Projections from a Single X-ray
(2022)
Presentation / Conference Contribution
Gradient Origin Networks
(2021)
Presentation / Conference Contribution
Shape tracing: An extension of sphere tracing for 3D non-convex collision in protein docking
(2020)
Presentation / Conference Contribution
Downloadable Citations
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search