Skip to main content

Research Repository

Advanced Search

Towards Cognition-Aligned Visual Language Models via Zero-Shot Instance Retrieval

Ma, Teng; Organisciak, Daniel; Ma, Wenbao; Long, Yang

Towards Cognition-Aligned Visual Language Models via Zero-Shot Instance Retrieval Thumbnail


Authors

Teng Ma

Daniel Organisciak

Wenbao Ma



Contributors

Arkaitz Zubiaga
Editor

Abstract

The pursuit of Artificial Intelligence (AI) that emulates human cognitive processes is a cornerstone of ethical AI development, ensuring that emerging technologies can seamlessly integrate into societal frameworks requiring nuanced understanding and decision-making. Zero-Shot Instance Retrieval (ZSIR) stands at the forefront of this endeavour, potentially providing a robust platform for AI systems, particularly large visual language models, to demonstrate and refine cognition-aligned learning without the need for direct experience. In this paper, we critically evaluate current cognition alignment methodologies within traditional zero-shot learning paradigms using visual attributes and word embedding generated by large AI models. We propose a unified similarity function that quantifies the cognitive alignment level, bridging the gap between AI processes and human-like understanding. Through extensive experimentation, our findings illustrate that this similarity function can effectively mirror the visual–semantic gap, steering the model towards enhanced performance in Zero-Shot Instance Retrieval. Our models achieve state-of-the-art performance on both the SUN (92.8% and 82.2%) and CUB datasets (59.92% and 48.82%) for bi-directional image-attribute retrieval accuracy. This work not only benchmarks the cognition alignment of AI but also sets a new precedent for the development of visual language models attuned to the complexities of human cognition.

Citation

Ma, T., Organisciak, D., Ma, W., & Long, Y. (2024). Towards Cognition-Aligned Visual Language Models via Zero-Shot Instance Retrieval. Electronics, 13(9), Article 1660. https://doi.org/10.3390/electronics13091660

Journal Article Type Article
Acceptance Date Apr 16, 2024
Online Publication Date Apr 25, 2024
Publication Date Apr 25, 2024
Deposit Date May 20, 2024
Publicly Available Date May 20, 2024
Journal Electronics
Electronic ISSN 2079-9292
Publisher MDPI
Peer Reviewed Peer Reviewed
Volume 13
Issue 9
Article Number 1660
DOI https://doi.org/10.3390/electronics13091660
Keywords cognition alignment, zero-shot instance retrieval, large visual language models
Public URL https://durham-repository.worktribe.com/output/2437625

Files





You might also like



Downloadable Citations