SEM-Net: Efficient Pixel Modelling for Image Inpainting with Spatially Enhanced SSM

Chen, Shuang; Zhang, Haozheng; Atapour-Abarghouei, Amir; Shum, Hubert P. H.

doi:10.1109/WACV61041.2025.00055

SEM-Net: Efficient Pixel Modelling for Image Inpainting with Spatially Enhanced SSM

Chen, Shuang; Zhang, Haozheng; Atapour-Abarghouei, Amir; Shum, Hubert P. H.

Authors

Chris Chen shuang.chen@durham.ac.uk
Post Doctoral Research Associate

Haozheng Zhang haozheng.zhang@durham.ac.uk
PGR Student Doctor of Philosophy

Dr Amir Atapour-Abarghouei amir.atapour-abarghouei@durham.ac.uk
Assistant Professor

Professor Hubert Shum hubert.shum@durham.ac.uk
Professor

Abstract

Image inpainting aims to repair a partially damaged image based on the information from known regions of the images. Achieving semantically plausible inpainting results is particularly challenging because it requires the reconstructed regions to exhibit similar patterns to the semanticly consistent regions. This requires a model with a strong capacity to capture long-range dependencies. Existing models struggle in this regard due to the slow growth of receptive field for Convolutional Neural Networks (CNNs) based methods and patch-level interactions in Transformer-based methods, which are ineffective for capturing long-range dependencies. Motivated by this, we propose SEM-Net, a novel visual State Space model (SSM) vision network, modelling corrupted images at the pixel level while capturing long-range dependencies (LRDs) in state space, achieving a linear computational complexity. To address the inherent lack of spatial awareness in SSM, we introduce the Snake Mamba Block (SMB) and Spatially-Enhanced Feed-forward Network. These innovations enable SEM-Net to outperform state-of-the-art inpainting methods on two distinct datasets, showing significant improvements in capturing LRDs and enhancement in spatial consistency. Additionally, SEM-Net achieves state-of-the-art performance on motion deblurring, demonstrating its generalizability. Our source code is available: https://github.com/ChrisChenl023/SEM-Net.

Citation

Chen, S., Zhang, H., Atapour-Abarghouei, A., & Shum, H. P. H. (2025, February). SEM-Net: Efficient Pixel Modelling for Image Inpainting with Spatially Enhanced SSM. Presented at 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, Arizona

Presentation Conference Type	Conference Paper (published)
Conference Name	2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
Start Date	Feb 26, 2025
End Date	Mar 6, 2025
Acceptance Date	Oct 28, 2024
Online Publication Date	Apr 8, 2025
Publication Date	Apr 8, 2025
Deposit Date	Nov 11, 2024
Publicly Available Date	Apr 8, 2025
Publisher	Institute of Electrical and Electronics Engineers
Peer Reviewed	Peer Reviewed
Pages	461-471
Series ISSN	2472-6737
Book Title	Proceedings of the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision
DOI	https://doi.org/10.1109/WACV61041.2025.00055
Public URL	https://durham-repository.worktribe.com/output/3091371