Haoran Duan haoran.duan@durham.ac.uk
PGR Student Doctor of Philosophy
Dynamic Unary Convolution in Transformers
Duan, Haoran; Long, Yang; Wang, Shidong; Zhang, Haofeng; Willcocks, Chris G.; Shao, Ling
Authors
Dr Yang Long yang.long@durham.ac.uk
Associate Professor
Shidong Wang
Haofeng Zhang
Dr Chris Willcocks christopher.g.willcocks@durham.ac.uk
Associate Professor
Ling Shao
Abstract
It is uncertain whether the power of transformer architectures can complement existing convolutional neural networks. A few recent attempts have combined convolution with transformer design through a range of structures in series, where the main contribution of this paper is to explore a parallel design approach. While previous transformed-based approaches need to segment the image into patch-wise tokens, we observe that the multi-head self-attention conducted on convolutional features is mainly sensitive to global correlations and that the performance degrades when these correlations are not exhibited. We propose two parallel modules along with multi-head self-attention to enhance the transformer. For local information, a dynamic local enhancement module leverages convolution to dynamically and explicitly enhance positive local patches and suppress the response to less informative ones. For mid-level structure, a novel unary co-occurrence excitation module utilizes convolution to actively search the local co-occurrence between patches. The parallel-designed Dynamic Unary Convolution in Transformer (DUCT) blocks are aggregated into a deep architecture, which is comprehensively evaluated across essential computer vision tasks in image-based classification, segmentation, retrieval and density estimation. Both qualitative and quantitative results show our parallel convolutional-transformer approach with dynamic and unary convolution outperforms existing series-designed structures.
Citation
Duan, H., Long, Y., Wang, S., Zhang, H., Willcocks, C. G., & Shao, L. (2023). Dynamic Unary Convolution in Transformers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(11), 12747 - 12759. https://doi.org/10.1109/tpami.2022.3233482
Journal Article Type | Article |
---|---|
Acceptance Date | Jan 1, 2023 |
Online Publication Date | Jan 2, 2023 |
Publication Date | Nov 1, 2023 |
Deposit Date | Jan 16, 2023 |
Publicly Available Date | Jan 16, 2023 |
Journal | IEEE Transactions on Pattern Analysis and Machine Intelligence |
Print ISSN | 0162-8828 |
Electronic ISSN | 1939-3539 |
Publisher | Institute of Electrical and Electronics Engineers |
Peer Reviewed | Peer Reviewed |
Volume | 45 |
Issue | 11 |
Pages | 12747 - 12759 |
DOI | https://doi.org/10.1109/tpami.2022.3233482 |
Public URL | https://durham-repository.worktribe.com/output/1185317 |
Files
Accepted Journal Article
(14.6 Mb)
PDF
Copyright Statement
© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
You might also like
EfficientTDNN: Efficient Architecture Search for Speaker Recognition
(2022)
Journal Article
Wearable-based behaviour interpolation for semi-supervised human activity recognition
(2024)
Journal Article
DS-Depth: Dynamic and Static Depth Estimation via a Fusion Cost Volume
(2023)
Journal Article
CTNeRF: Cross-time Transformer for dynamic neural radiance field from monocular video
(2024)
Journal Article
Downloadable Citations
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search