How Does an Adjective Sound Like? Improving Audio Phrase Composition with Text EmbeddingsDownload PDF


16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
Abstract: We learn matrix representations for the most frequent sound-relevant adjectives of English and compose them with vector representations of their nouns. The matrices are learnt jointly from audio and textual data, via linear regression (LR) and tensor skipgram (TSG). Their quality is as assessed on a novel adjective noun phrase similarity dataset, applied to two tasks: semantic similarity and audio similarity. Joint learning via TSG outperforms audio-only models, matrix composition outperforms addition and non compositional phrase vectors.
Paper Type: short
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Contribution Types: NLP engineering experiment, Reproduction study, Data resources
Languages Studied: English
0 Replies
