What to do if language models disagree? Black-box model ensembling for textual and visual question answering

Anonymous

What to do if language models disagree? Black-box model ensembling for textual and visual question answering

Anonymous

16 Oct 2023ACL ARR 2023 October Blind SubmissionReaders: Everyone

Abstract: A diverse range of large language models (LLMs), e.g., ChatGPT, and visual question answering (VQA) models, e.g., BLIP, has been developed for addressing text and visual question answering tasks. However, both LLMs and VQA models encounter challenges when applied to out-domain datasets. Fine-tuning these models for domain adaptation is either impossible (only accessible by APIs as black-box models) or computationally expensive (big model size), and often only limited labeled out-domain data is available. Under these constraints, ensemble techniques provide a compelling alternative. In this paper, we aim to improve out-domain model performance by utilizing the capabilities of existing black-box models with limited computational cost and labeled data. To address this challenge, we introduce a novel data-efficient ensemble method, InfoSel, which trains small-size (<120M parameters) ensemble models to select the best answers without relying on prediction confidences for both text and visual question answering tasks. Our results demonstrate that InfoSel improves the performance compared to the ensembled base models over four mini datasets sampled from SQuAD-V2, NQ-Open, GQA and VizWiz.

Paper Type: long

Research Area: Question Answering

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings

Languages Studied: English

Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.

0 Replies

Loading