TL;DR: This paper presents an extension of the latent language framework for few-shot visual classification, utilising large pre-trained neural networks to allow for zero-shot natural language rule generation
Abstract: The ability to generate rules and hypotheses plays a key role in multiple aspects of human cognition including concept learning and explanation. Previous research has framed this ability as a form of inference via probabilistic program induction. However, this approach requires careful construction of the right grammar and hypothesis space for a particular task. In this work, we propose an alternative computational account of rule generation and concept learning that sidesteps some of these issues. By leveraging advances in multimodal learning and large language models, we extend the latent language framework from Andreas et al. (2017) to work in a zero-shot manner. Taking naturalistic images as input, our computational model is capable of generating candidate rules that are specified in natural language, and verifying them against the observed data. We show that our model can generate, in a zero-shot manner, plausible rules for visual concepts in two domains.
Track: Non-Archival (will not appear in proceedings)
0 Replies
Loading