Meaning Representations as Variational Quantum Circuits
Keywords: Quantum Machine Learning, Syntax, Semantics, Vision-Language Models, Machine Learning, ICML
TL;DR: Parameter-efficient variational quantum circuits grounded in linguistic composition for contrastive vision-language captioning.
Abstract: Large vision-language models struggle with a "compositionality gap" due to their reliance on unstructured statistical approximations to capture complex relations. We propose CCG-VQC, a quantum framework that explicitly maps linguistic syntax into parametrised quantum circuits. To separate the impact of architectural design from scale, we introduce MicroCLIP, a parameter-matched classical transformer. While CCG-VQC achieved 71.19 % on ARO-Attribution, MicroCLIP struggled to surpass random chance (50.85 %), and even standard CLIP lags behind at 61.00 %. This demonstrates that in low-parameter regimes, explicit linguistic structure outperforms statistical approximation.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 199
Loading