TL;DR: Sound-guided manipulation of paintings.
Abstract: If a picture is worth a thousand words, sound may cost a million. Additionally, it is incredibly challenging to accurately use words to describe the nuances and complexities of sound. Recent robotic painting and other image synthesis methods have achieved progress in generating visuals from language inputs, but the translation of sound into images is vastly unexplored. Audio data has the potential to expand the accessibility and controllability for the user and provide a means to convey complex emotions and the dynamic aspects of the real world. Here, we propose to extend the recent robotic painting framework, FRIDA, by incorporating an additional generalized sound-semantics step that encodes sound into the image-text embedding space and aids in the manipulation of the painting planning process for controlling the robot painting. We illustrate how our approach may be used in conjunction with existing modalities to create paintings that adhere to the guiding semantics while also enhancing the user's control capabilities. While sound-guidance has been used in image manipulation, few existing work uses sound inputs to create general image content. In this paper, we share our preliminary results in a qualitative form.
Submission Type: non-archival
Presentation Type: onsite
Presenter: Peter Schaldenbrand