Birdie: Advancing State Space Models with a Minimalist Architecture and Novel Pre-training Objectives

ACL ARR 2024 June Submission3533 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: State Space Models (SSMs) are emerging as alternatives to Transformers but struggle with tasks needing long-range interactions, such as text copying and multi-query associative recall. Most improvements in SSMs focus on internal architecture rather than exploring diverse pre-training objectives. This paper introduces the Birdie model, a minimalist SSM architecture, with novel pre-training objectives. Experimental evaluations demonstrate that combining this minimalist architecture designed with refined control over recurrence parameterization with pre-training objectives like infilling, copying, and deshuffling significantly improves performance in practical generative tasks, achieving higher average metric scores and win rates. The findings offer valuable insights for optimizing SSMs to compete with Transformers.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: Language Modeling, Machine Learning for NLP, Resources and Evaluation, Generation, Interpretability and Analysis of Models for NLP, Efficient/Low-Resource methods for NLP
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English
Submission Number: 3533
Loading