An Empirical Analysis Towards Replacing Vocabulary-Rigid Embeddings by a Vocabulary-Free Mechanism

Alejandro Rodriguez Perez; Korn Sooksatra; Pablo Rivas; Ernesto Quevedo Caballero; Javier S. Turek; Gisela Bichler; Tomas Cerny; Laurie Giddens; Stacie Petter

An Empirical Analysis Towards Replacing Vocabulary-Rigid Embeddings by a Vocabulary-Free Mechanism

Alejandro Rodriguez Perez, Korn Sooksatra, Pablo Rivas, Ernesto Quevedo Caballero, Javier S. Turek, Gisela Bichler, Tomas Cerny, Laurie Giddens, Stacie Petter

Published: 03 Jul 2023, Last Modified: 12 Jul 2023LXAI @ ICML 2023 Regular Deadline PosterEveryoneRevisionsBibTeX

Keywords: bert, model distillation, transfer learning, word embeddings, transformers, natural language processing

TL;DR: This study proposes a method to replace the vocabulary-rigid transformer model's word-embedding layer with a vocabulary-free one using a CNN, and finds cosine-based metrics yield better results, offering a path to more flexible NLP models.

Abstract: This paper addresses the limitations of subword based models in NLP by aligning the word embedding layer of a vocabulary-rigid transformer model to a vocabulary-free one. In order to do so, a CNN is trained to mimic the word embeddings layer of a BERT model, using a sequence of byte tokens as input. The study compares cosine-based and Euclidean-based loss functions for training the student network and finds better results with cosine-based metrics. The research contributes techniques for re-training transformer embedding layers and provides insights into loss function selection. The findings have implications for developing flexible and robust NLP models.

Submission Type: Archival (to be published in the Journal of LatinX in AI (LXAI) Research)

Submission Number: 11

Loading