Characterizations of Language Generation With Breadth

Published: 24 Dec 2024, Last Modified: 27 Jan 2025arXivEveryoneCC BY 4.0
Abstract: We study language generation in the limit, which was introduced by Kleinberg and Mul- lainathan [KM24] building on classical works of Gold [Gol67] and Angluin [Ang79]. The re- sult of [KM24] is an algorithm for generating from any countable language collection in the limit. While their algorithm eventually generates strings from the target language K, it sacri- fices breadth, i.e., its ability to output all strings in K. The main open question of [KM24] was whether this trade-off between consistency and breadth is necessary for language generation. Recent work by Kalavasis, Mehrotra, and Velegkas [KMV24] proposed three definitions for consistent language generation with breadth in the limit: generation with exact breadth, generation with approximate breadth, and unambiguous generation. Concurrent and indepen- dent work by Charikar and Pabbaraju [CP24a] introduced a different notion, called exhaustive generation. Both of these works explore when language generation with (different notions of) breadth is possible. In this work, we fully characterize language generation for all these notions of breadth and their natural combinations. Building on [CP24a; KMV24], we give an unconditional lower bound for generation with exact breadth, removing a technical condition needed in [KMV24] and extending the unconditional lower bound of [CP24a] which holds for specific collections; our result shows that generation with exact breadth is characterized by Angluin’s condition for identification from positive examples [Ang80]. Furthermore, we introduce a weakening of Angluin’s condition and show that it tightly characterizes both generation with approximate breadth and exhaustive generation, thus showing that these two notions are equivalent. More- over, we show that Angluin’s condition further characterizes unambiguous generation in the limit as a corollary of a more general result that applies to a family of notions of breadth. We discuss the implications of our results in the statistical setting of Bousquet, Hanneke, Moran, van Handel, and Yehudayoff [BHMvY21]. Finally, we provide unconditional lower bounds for stable generators, strengthening the results of [KMV24], and we show that for stable generators all the aforementioned notions of breadth are characterized by Angluin’s condition. This gives a separation for generation with approximate breadth, between stable and unstable generators.
Loading