Adaptive vision transformer for enhanced perception in visual prostheses
Abstract: Prosthetic vision has emerged as a promising solution for restoring sight in visually impaired individuals. However, suboptimal perceptions prevent users from performing daily tasks effectively. Recent studies have shown that both physical constraints and anatomical characteristics contribute to phosphene distortions, highlighting the need for a personalized approach to enhance user experience. In this context, integrating deep learning-based strategies with prosthetic models and patient-specific information has demonstrated strong potential in generating more useful perceptions. Our approach improves upon previous methods by introducing a novel neural network architecture that incorporates a vision transformer to analyze both visual input and patient-specific parameters, aiming to reduce distortions through optimized stimulation parameters. Additionally, we develop geometric transformations to correct rotations and translations within the implant’s field of view. The proposed model outperforms baseline methods on the MNIST dataset and sets a new baseline for more complex images, generating suitable perceptions for classification tasks in ImageNet, CIFAR-10 and COCO datasets.
Loading