Feature fine-tuning and attribute representation transformation for zero-shot learning

Pang, Shanmin; He, Xin; Hao, Wenyu; Long, Yang

doi:10.1016/j.cviu.2023.103811

Feature fine-tuning and attribute representation transformation for zero-shot learning

Pang, Shanmin; He, Xin; Hao, Wenyu; Long, Yang

Authors

Shanmin Pang

Xin He

Wenyu Hao

Dr Yang Long yang.long@durham.ac.uk
Assistant Professor

Abstract

Zero-Shot Learning (ZSL) aims to generalize a pretrained classification model to unseen classes with the help of auxiliary semantic information. Recent generative methods are based on the paradigm of synthesizing unseen visual data from class attributes. A mapping is learnt from semantic attributes to visual features extracted by a pre-trained backbone such as ResNet101 by training a generative adversarial network. Considering the domain-shift problem between pre-trained backbone and task ZSL dataset as well as the information asymmetry problem between images and attributes, this manuscript suggests that the visual-semantic balance should be learnt separately from the ZSL models. In particular, we propose a plug-and-play Attribute Representation Transformation (ART) framework to pre-process visual features with a contrastive regression module and an attribute place-holder module. Our contrastive regression loss is a tailored design for visual-attribute transformation, which gains favorable properties from both classification and regression losses. As for the attribute place-holder module, an end-to-end mapping loss function is introduced to build the relationship between transformed features and semantic attributes. Experiments conducted on five popular benchmarks manifest that the proposed ART framework can significantly benefit existing generative models in both ZSL and generalized ZSL settings.

Citation

Pang, S., He, X., Hao, W., & Long, Y. (2023). Feature fine-tuning and attribute representation transformation for zero-shot learning. Computer Vision and Image Understanding, 236, Article 103811. https://doi.org/10.1016/j.cviu.2023.103811

Journal Article Type	Article
Acceptance Date	Aug 18, 2023
Online Publication Date	Sep 1, 2023
Publication Date	2023-11
Deposit Date	Oct 23, 2023
Publicly Available Date	Sep 2, 2024
Journal	Computer Vision and Image Understanding
Print ISSN	1077-3142
Electronic ISSN	1090-235X
Publisher	Elsevier
Peer Reviewed	Peer Reviewed
Volume	236
Article Number	103811
DOI	https://doi.org/10.1016/j.cviu.2023.103811
Public URL	https://durham-repository.worktribe.com/output/1815174