Chris Chen shuang.chen@durham.ac.uk
Post Doctoral Research Associate
HINT: High-quality INpainting Transformer with Mask-Aware Encoding and Enhanced Attention
Chen, Shuang; Atapour-Abarghouei, Amir; Shum, Hubert P. H.
Authors
Dr Amir Atapour-Abarghouei amir.atapour-abarghouei@durham.ac.uk
Assistant Professor
Professor Hubert Shum hubert.shum@durham.ac.uk
Professor
Abstract
Existing image inpainting methods leverage convolution-based downsampling approaches to reduce spatial dimensions. This may result in information loss from corrupted images where the available information is inherently sparse, especially for the scenario of large missing regions. Recent advances in self-attention mechanisms within transformers have led to significant improvements in many computer vision tasks including inpainting. However, limited by the computational costs, existing methods cannot fully exploit the efficacy of long-range modelling capabilities of such models. In this paper, we propose an end-to-end High-quality INpainting Transformer, abbreviated as HINT, which consists of a novel mask-aware pixel-shuffle downsampling module (MPD) to preserve the visible information extracted from the corrupted image while maintaining the integrity of the information available for highlevel inferences made within the model. Moreover, we propose a Spatially-activated Channel Attention Layer (SCAL), an efficient self-attention mechanism interpreting spatial awareness to model the corrupted image at multiple scales. To further enhance the effectiveness of SCAL, motivated by recent advanced in speech recognition, we introduce a sandwich structure that places feed-forward networks before and after the SCAL module. We demonstrate the superior performance of HINT compared to contemporary state-of-the-art models on four datasets, CelebA, CelebA-HQ, Places2, and Dunhuang.
Citation
Chen, S., Atapour-Abarghouei, A., & Shum, H. P. H. (2024). HINT: High-quality INpainting Transformer with Mask-Aware Encoding and Enhanced Attention. IEEE Transactions on Multimedia, 26, 7649-7660. https://doi.org/10.1109/TMM.2024.3369897
Journal Article Type | Article |
---|---|
Acceptance Date | Feb 20, 2024 |
Online Publication Date | Mar 4, 2024 |
Publication Date | Mar 4, 2024 |
Deposit Date | Feb 23, 2024 |
Publicly Available Date | Mar 14, 2024 |
Journal | IEEE Transactions on Multimedia |
Print ISSN | 1520-9210 |
Electronic ISSN | 1941-0077 |
Publisher | Institute of Electrical and Electronics Engineers |
Peer Reviewed | Peer Reviewed |
Volume | 26 |
Pages | 7649-7660 |
DOI | https://doi.org/10.1109/TMM.2024.3369897 |
Public URL | https://durham-repository.worktribe.com/output/2272815 |
Files
Accepted Journal Article
(27.3 Mb)
PDF
Licence
http://creativecommons.org/licenses/by/4.0/
Copyright Statement
This accepted manuscript is licensed under the Creative Commons Attribution 4.0 licence. https://creativecommons.org/licenses/by/4.0/
You might also like
INCLG: Inpainting for Non-Cleft Lip Generation with a Multi-Task Image Processing Network
(2023)
Journal Article
Downloadable Citations
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search