Towards Explainable and Efficient Multi-Modality Learning: Domain-Agnostic Concept Space Paired with Domain-Specific Projection Models

Yuchong Geng; Ao Tang

Towards Explainable and Efficient Multi-Modality Learning: Domain-Agnostic Concept Space Paired with Domain-Specific Projection Models

Yuchong Geng, Ao Tang

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: representation learning for computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Concept Learning, Muti-Modality Model, Probabilistic Reasoning

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: In an effort to create a more explainable AI system, we introduce a novel multi-modality learning framework in this study. This framework leverages a domain-agnostic concept space designed to be transparent and interpretable and a set of domain-specific projection models tailored to process distinct modality inputs and map them onto this concept space. This separation of the concept space and the projection models brings versatility to our framework, allowing easy adaptations to various modalities and downstream tasks. We evaluate our framework's performance in a zero-shot setting on two popular tasks: Image-Text Matching and Visual Question Answering. Our framework achieves performance levels on par with benchmark fine-tuned models for these tasks while maintaining an explainable architecture.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6383

Loading