Visual Features for Linguists: Basic image analysis techniques for multimodally-curious NLPers

Elia Bruni, Marco Baroni

2013 (modified: 16 Jul 2019)ACL (Tutorial Abstracts) 2013Readers: Everyone

Abstract: Features automatically extracted from images constitute a new and rich source of semantic knowledge that can complement information extracted from text. The convergence between visionand text-based information can be exploited in scenarios where the two modalities must be combined to solve a target task (e.g., generating verbal descriptions of images, or finding the right images to illustrate a story). However, the potential applications for integrated visual features go beyond mixed-media scenarios: Because of their complementary nature with respect to language, visual features might provide perceptually grounded semantic information that can be exploited in purely linguistic domains.

0 Replies