Teng Ma
Towards Cognition-Aligned Visual Language Models via Zero-Shot Instance Retrieval
Ma, Teng; Organisciak, Daniel; Ma, Wenbao; Long, Yang
Authors
Contributors
Arkaitz Zubiaga
Editor
Abstract
The pursuit of Artificial Intelligence (AI) that emulates human cognitive processes is a cornerstone of ethical AI development, ensuring that emerging technologies can seamlessly integrate into societal frameworks requiring nuanced understanding and decision-making. Zero-Shot Instance Retrieval (ZSIR) stands at the forefront of this endeavour, potentially providing a robust platform for AI systems, particularly large visual language models, to demonstrate and refine cognition-aligned learning without the need for direct experience. In this paper, we critically evaluate current cognition alignment methodologies within traditional zero-shot learning paradigms using visual attributes and word embedding generated by large AI models. We propose a unified similarity function that quantifies the cognitive alignment level, bridging the gap between AI processes and human-like understanding. Through extensive experimentation, our findings illustrate that this similarity function can effectively mirror the visual–semantic gap, steering the model towards enhanced performance in Zero-Shot Instance Retrieval. Our models achieve state-of-the-art performance on both the SUN (92.8% and 82.2%) and CUB datasets (59.92% and 48.82%) for bi-directional image-attribute retrieval accuracy. This work not only benchmarks the cognition alignment of AI but also sets a new precedent for the development of visual language models attuned to the complexities of human cognition.
Citation
Ma, T., Organisciak, D., Ma, W., & Long, Y. (2024). Towards Cognition-Aligned Visual Language Models via Zero-Shot Instance Retrieval. Electronics, 13(9), Article 1660. https://doi.org/10.3390/electronics13091660
Journal Article Type | Article |
---|---|
Acceptance Date | Apr 16, 2024 |
Online Publication Date | Apr 25, 2024 |
Publication Date | Apr 25, 2024 |
Deposit Date | May 20, 2024 |
Publicly Available Date | May 20, 2024 |
Journal | Electronics |
Electronic ISSN | 2079-9292 |
Publisher | MDPI |
Peer Reviewed | Peer Reviewed |
Volume | 13 |
Issue | 9 |
Article Number | 1660 |
DOI | https://doi.org/10.3390/electronics13091660 |
Keywords | cognition alignment, zero-shot instance retrieval, large visual language models |
Public URL | https://durham-repository.worktribe.com/output/2437625 |
Files
Published Journal Article
(821 Kb)
PDF
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
You might also like
EfficientTDNN: Efficient Architecture Search for Speaker Recognition
(2022)
Journal Article
Kernelized distance learning for zero-shot recognition
(2021)
Journal Article
A plug-in attribute correction module for generalized zero-shot learning
(2020)
Journal Article
Semantic combined network for zero-shot scene parsing
(2019)
Journal Article
Downloadable Citations
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search