Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: LOGIC, LANGUAGE, MEMORY, MECHANISTIC, FRAMEWORK, LLM, GENERATIVE, AI
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We designed a framework to test language model on various aspects of language in a controllable way. Based on the results, we designed new architecture which has benefits of attention as well as hyena operator.
Abstract: The performance of Large Language Models have achieved superhuman breadth
with unprecedented depth. At the same time, the language models are mostly
black box models and the underlying mechanisms for performance have been
evaluated using synthetic or mechanistic schemes. We extend current mechanis-
tic schemes to incorporate Logic, memory, and nuances of Language such as la-
tent structure. The proposed framework is called LOLAMEME and we provide
two instantiations of LOLAMEME: LoLa and MeMe languages. We then con-
sider two generative language model architectures: transformer-based GPT-2 and
convolution-based Hyena. We propose the hybrid architecture T HEX and use LO-
LAMEME framework is used to compare three architectures. T HEX outperforms
GPT-2 and Hyena on select tasks.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2772
Loading