WaveletGPT: Wavelets Meet Large Language Models

TMLR Paper2970 Authors

06 Jul 2024 (modified: 20 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language Models (LLMs) have ushered in a new wave of artificial intelligence advancements impacting every scientific field and discipline. They are trained on a simple objective: to predict the next token given the previous context. We live in a world where most of the data around us, e.g., text, audio, and music, has a multi-scale structure associated with them. This paper infuses LLMs with traditional signal processing ideas, namely wavelets, during pre-training to take advantage of the structure. Without adding \textbf{any extra parameters} to a GPT-style LLM architecture, we achieve the same pre-training performance almost twice as fast for LLMs in text, raw audio, and symbolic music by imposing a structure on intermediate embeddings. When trained for the same number of training steps, we achieve significant gains in performance, which is comparable to pre-training a much larger neural architecture. Our architecture allows every next token prediction to have access to intermediate embeddings at different temporal resolution in every Transformer decoder layer. This work will hopefully pave the way for incorporating multi-rate signal processing ideas into traditional large language model pre-training. Further, we showcase pushing model performance by improving internal structure as opposed to just going after scale.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: -- Added an Appendix to compare against Exponential Moving Averages as requested ========== -- Addressed concerns of all the reviewers -- Updated Figures, Algorithm, Equations -- Added new section on Long Range Arena -- Added new section on depth and embedding dimension experiment -- Added new column on GPU clock time -- Fixed Typos -- Added References as Requested by reviewers -- Added version that contains automated highlighted changes with Figure 2, 3,4 updated.
Assigned Action Editor: ~Brian_Kingsbury1
Submission Number: 2970
Loading