Inductive Transformers: How Large Language Models Form Concepts, And How to Make Them Even Better At It

Ben Vigoda; Thomas Bernard Rochais

Inductive Transformers: How Large Language Models Form Concepts, And How to Make Them Even Better At It

Ben Vigoda, Thomas Bernard Rochais

20 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: inductive bias, transformers, encoder, decoder, natural language, large language model, probabilistic graphical models, belief propagation, message passing, open universe, probabilistic program, probabilistic grammar, perturbation convergence experiment, machine learning identifiability, controllability, alignment, neurodiversity, concept learning, generative models

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We present a new approach to designing additional inductive bias into transformers to enable tighter conceptual organization, greater conceptual control, and higher levels of conceptual abstraction.

Abstract: We present a new approach to designing additional inductive bias into transformers to enable tighter conceptual organization, greater conceptual control, and higher levels of conceptual abstraction. This is a paper for those who would like to understand why transformers are structured the way they are and how new versions could be designed for ``neuro-diversity'' -- to learn differently from the same data. This family of inductive bias requires only modest modifications to transformer activation functions. We explain the approach and give an illustrative example simulation.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2683

Loading