CoLLIE: Continual Learning of Language Grounding from Language-Image Embeddings

Gabriel Skantze; Bram Willemsen

CoLLIE: Continual Learning of Language Grounding from Language-Image Embeddings

Gabriel Skantze, Bram Willemsen

Published: 28 Jan 2022, Last Modified: 22 Jun 2025ICLR 2022 SubmittedReaders: Everyone

Keywords: Continual Learning, Language Grounding, Language-Image Embeddings, Multimodal Distributional Semantics, Reference Resolution

Abstract: This paper presents CoLLIE: a simple, yet effective model for continual learning of how language is grounded in vision. Given a pre-trained multimodal embedding model, where language and images are projected in the same semantic space (in this case CLIP by OpenAI), CoLLIE learns a transformation function that adjusts the language embeddings when needed to accommodate new language use. Unlike traditional few-shot learning, the model does not just learn new classes and labels, but can also generalize to similar language use. We verify the model's performance on two different tasks of continual learning and show that it can efficiently learn and generalize from only a few examples, with little interference with the model's original zero-shot performance.

One-sentence Summary: We present a new model for continual learning of how language is grounded in vision, through multimodal embeddings.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/collie-continual-learning-of-language/code)

19 Replies

Loading