Shanmin Pang
Feature fine-tuning and attribute representation transformation for zero-shot learning
Pang, Shanmin; He, Xin; Hao, Wenyu; Long, Yang
Abstract
Zero-Shot Learning (ZSL) aims to generalize a pretrained classification model to unseen classes with the help of auxiliary semantic information. Recent generative methods are based on the paradigm of synthesizing unseen visual data from class attributes. A mapping is learnt from semantic attributes to visual features extracted by a pre-trained backbone such as ResNet101 by training a generative adversarial network. Considering the domain-shift problem between pre-trained backbone and task ZSL dataset as well as the information asymmetry problem between images and attributes, this manuscript suggests that the visual-semantic balance should be learnt separately from the ZSL models. In particular, we propose a plug-and-play Attribute Representation Transformation (ART) framework to pre-process visual features with a contrastive regression module and an attribute place-holder module. Our contrastive regression loss is a tailored design for visual-attribute transformation, which gains favorable properties from both classification and regression losses. As for the attribute place-holder module, an end-to-end mapping loss function is introduced to build the relationship between transformed features and semantic attributes. Experiments conducted on five popular benchmarks manifest that the proposed ART framework can significantly benefit existing generative models in both ZSL and generalized ZSL settings.
Citation
Pang, S., He, X., Hao, W., & Long, Y. (2023). Feature fine-tuning and attribute representation transformation for zero-shot learning. Computer Vision and Image Understanding, 236, Article 103811. https://doi.org/10.1016/j.cviu.2023.103811
Journal Article Type | Article |
---|---|
Acceptance Date | Aug 18, 2023 |
Online Publication Date | Sep 1, 2023 |
Publication Date | 2023-11 |
Deposit Date | Oct 23, 2023 |
Publicly Available Date | Sep 2, 2024 |
Journal | Computer Vision and Image Understanding |
Print ISSN | 1077-3142 |
Electronic ISSN | 1090-235X |
Publisher | Elsevier |
Peer Reviewed | Peer Reviewed |
Volume | 236 |
Article Number | 103811 |
DOI | https://doi.org/10.1016/j.cviu.2023.103811 |
Public URL | https://durham-repository.worktribe.com/output/1815174 |
Files
Accepted Journal Article
(2.2 Mb)
PDF
Licence
http://creativecommons.org/licenses/by-nc-nd/4.0/
Publisher Licence URL
http://creativecommons.org/licenses/by-nc-nd/4.0/
Copyright Statement
© 2023. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/
You might also like
EfficientTDNN: Efficient Architecture Search for Speaker Recognition
(2022)
Journal Article
Kernelized distance learning for zero-shot recognition
(2021)
Journal Article
A plug-in attribute correction module for generalized zero-shot learning
(2020)
Journal Article
Semantic combined network for zero-shot scene parsing
(2019)
Journal Article
Downloadable Citations
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search