End-to-end Neuromorphic Lip Reading

Published: 01 Jan 2023, Last Modified: 05 Mar 2025CVPR Workshops 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Human speech perception is intrinsically a multi-modal task since speech production requires the speaker to move the lips, producing visual cues in addition to auditory information. Lip reading consists in visually interpreting the movements of the lips to understand speech, without the use of sound. It is an important task since it can either complement an audio-based speech recognition system or replace it when sound is not available. We introduce in this paper a neuromorphic model for lip reading, that uses events produced by an event-based sensor capturing lips motion as input, and that classifies short event sequences in word categories based on a SNN architecture. Experimental results show that the proposed model successfully leverages various advantages of neuromorphic approaches such as energy efficiency and low latency, which are central features in real-time embedded scenarios. To the best of our knowledge, it is the first proposal of an end-to-end neuromorphic lip reading model.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview