Bit Cipher — A Simple yet Powerful Word Representation System that Integrates Efficiently with Language-Models

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: representation learning, word embedding, efficient NLP, language language model training, fine-tuning
TL;DR: We introduce a novel word representation system utilizing Bit-Cipher, which leverages contextual information and employs unigram-frequency-based dimensionality reduction methods, providing strong interpretability, alongside efficiency.
Abstract: While Large Language Models (LLMs) become ever more dominant, classic pre-trained word embeddings sustain their relevance through computational efficiency and nuanced linguistic interpretation. Drawing from recent studies demonstrating that the convergence of GloVe and word2vec optimizations _all_ tend towards log-co-occurrence matrix variants, we construct a novel word representation system called _**Bit-cipher**_ that eliminates the need of backpropagation while leveraging contextual information and hyper-efficient dimensionality reduction techniques based on unigram frequency, providing strong interpretability, alongside efficiency. We use the bit-cipher algorithm to train word vectors via a two-step process that critically relies on a hyperparameter---_bits_---that controls the vector dimension. While the first step trains the bit-cipher, the second utilizes it under two different aggregation modes---_summation_ or _concatenation_---to produce contextually rich representations from word co-occurrences. We extend our investigation into bit-cipher's efficacy, performing probing experiments on part-of-speech (POS) tagging and named entity recognition (NER) to assess its competitiveness with classic embeddings like word2vec and GloVe. Additionally, we explore its applicability in LM training and fine-tuning. By replacing embedding layers with cipher embeddings, our experiments illustrate the notable efficiency of cipher in accelerating the training process and attaining better optima compared to conventional training paradigms. In fine-tuning experiments, training cipher embeddings on target datasets and replacing the embedding layer of the LMs to be fine-tuned negates the need for extensive model adjustments, offering a highly efficient transfer learning alternative. Experiments on the integration of bit-cipher embedding layers with Roberta, T5, and OPT, prior to or as a substitute for fine-tuning, showcase a promising enhancement to transfer learning, allowing rapid model convergence while preserving competitive performance.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7569
Loading