Abstract: MoE models and other complex routing methods have garnered a lot of attention. Procuring those models is a costly endeavour, which is complex to distribute. This paper shows that simple hard routing has a lot of potential in Tool-Use QA applications, allowing for more distributed training and decentralization, and has potential applications elsewhere. The task considered is based on a proposed with separated the ReACT stages for separately trained models: Planner, Caller and Summariser. Previous approaches have largely relied on zero-shot comprehension of the nature of specific APIs based on the provided documentation. Zero-shot abilities of small models are limited and studies show that most commonly failures occur at the Caller stage of the pipeline. Therefore this study shifts away from zero-shot assumptions by using a hard routing-based strategy utilizing expert adapters for each category of APIs. The experimentation has shown that this pipeline can allow the 7 Billion model, to beat much larger, modern and closed-source models used in a zero-shot scenario on this task.
Paper Type: Short
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: open-domain QA,parameter-efficient-training,retrieval-augmented generation,NLP in resource-constrained settings,few-shot QA,conversational QA
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 2544
Loading