Weak Tokenization: A Preliminary Study of Dynamic Audio Chunking for Irregular Music Generation

Published: 08 Sept 2025, Last Modified: 10 Sept 2025LLM4Music @ ISMIR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Music generation ;Tokenization of Music
TL;DR: We investigate whether LLMs can move beyond rigid tokenization to generate aesthetically irregular music like IDM, using dynamic chunking and a learnable complexity objective.
Abstract: Most large language models (LLMs) for music generation rely on strong tokenization, discretizing audio into fixed, uniform units. While effective for producing stylistically coherent outputs, such models struggle with genres like IDM and Glitch, where irregularity is central to the aesthetic. Inspired by tokenizer-free trends in NLP, we investigate the potential of an alternative framework combining: (1) a Dynamic Chunking mechanism that segments audio based on content similarity rather than fixed grids, and (2) the L-Score, a learnable complexity metric spanning timbral, rhythmic, and structural dimensions. Preliminary results indicate that while the model captures some spectral features, it fails to produce rhythmic control—instead, it generates chaotic rather than deliberately irregular patterns. This limitation motivates future work on modeling controlled deviance in music generation—moving beyond statistical complexity toward learnable representations of aesthetic misdirection and expectation violation.
Submission Number: 5
Loading