VQEL: Enabling Self-Play in Emergent Language Games via Agent Internal Vector Quantization

VQEL: Enabling Self-Play in Emergent Language Games via Agent Internal Vector Quantization

10 Apr 2026 (modified: 23 Apr 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Emergent Language (EL) focuses on the emergence of communication among artificial agents. Although symbolic communication channels more closely mirror the discrete nature of human language, learning such protocols remains fundamentally difficult due to the non-differentiability of symbol sampling. Existing approaches typically rely on high-variance gradient estimators such as REINFORCE or on continuous relaxations such as Gumbel–Softmax, both of which suffer from limitations in training stability and scalability when learning a language from scratch. Motivated by cognitive theories that emphasize intrapersonal processes preceding communication, we explore self-play as a substrate for language emergence prior to mutual interaction. We introduce Vector Quantized Emergent Language (VQEL), a novel architecture that incorporates vector quantization into the message generation process. VQEL enables agents to perform self-play using discrete internal representations derived from a learned codebook while preserving end-to-end differentiability. By grounding the vocabulary through dense gradients in self-play, VQEL completely avoids the cold-start instability of reinforcement learning. The resulting vector-quantized codebook naturally induces a symbolic vocabulary that serves as a highly robust initialization for subsequent REINFORCE-based fine-tuning during mutual play with other agents. Empirical results show that agents pretrained via VQEL self-play achieve more consistent symbol alignment and higher task success when later engaged in mutual interaction. These findings position self-play as a principled and effective mechanism for learning discrete communication protocols, addressing key optimization and representational challenges in emergent language systems.

Submission Type: Regular submission (no more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=OI1OJgPzno

Changes Since Last Submission: **Clarification of the Core Claim and Motivation:** We have significantly rewritten the Abstract, Introduction, and Method sections to make our primary claim much more transparent. We explicitly clarify that VQEL is designed to build a highly structured, grounded foundational language via self-play prior to multi-agent interaction. We emphasize that this internally developed language serves as an exceptionally robust starting point: during mutual play, the sender’s language is so well-formed that its parameters can be strictly frozen (requiring only the receiver to adapt). Alternatively, it can be used as a highly stable initialization for REINFORCE-based fine-tuning, thereby completely bypassing the cold-start instability and high variance typically associated with learning discrete communication from scratch. This clarification explicitly addresses previous scrutiny regarding the combination of VQ and policy gradient optimization. **Streamlined Mathematical Notation:** To address concerns regarding readability and crowded notation, we have carefully reorganized the mathematical formulation in the Method section. We reduced the number of redundant equations and systematically cleaned up the superscripts and subscripts to make the formulation cleaner and easier to follow. Crucially, we ensured that these simplifications enhance the text's flow without compromising mathematical rigor.

Assigned Action Editor: ~Baoxiang_Wang1

Submission Number: 8352

Loading