Keywords: Manifold Learning, Representation Learning, Variational Autoencoder
TL;DR: We propose to learn motion concepts and global motion correlations of the vocal folds and surrounding tissue from endoscopic images using manifold learning based on variational autoencoder
Abstract: Our vision is a motion model of the oscillating vocal fodls that can prospectively be used for motion prediction and anomaly detection in laryngeal laser surgery during phonation. In this work we propose to learn motion concepts and global motion correlations of the vocal folds and surrounding tissue from endoscopic images using manifold learning based on a variational autoencoder. Our experiments show that the basic concepts (e.g. distance) are encoded in the latent representation of our learned manifold. It is also possible to distinguish between a relaxed and a contracted larynx (during phonation). It is further possible to identify the stages of phonation based on the latent embedding. The sequence of the latent variables seems structured and presumably suited for prediction tasks. Anomalies in the input data are clearly visible in the latent embedding as they are not within the subspace of the motion manifold. The motion model represents a strong prior belief about vocal fold motion. The proposed method seems to be a promising approach in generating motion and oscillation models of the vocal folds. It seems feasible for future motion prediction and anomaly detection. A more in-depth assessment with extension to higher-level models is planned.
Code Of Conduct: I have read and accept the code of conduct.