GenVOG-DiT: A Transformer-Based Diffusion Model for Pose-Driven, Patient-Agnostic Nystagmus VOG Video Generation

30 Nov 2025 (modified: 15 Dec 2025)MIDL 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Video Diffusion, Medical AI, Ophthalmology, Neurology
Abstract: GenVOG-Dit is a diffusion transformer-based framework designed to generate realistic synthetic videos of nystagmus, a condition marked by involuntary, repetitive eye movements that compromise visual acuity and often signal underlying neurological disorders. Progress in deep learning for nystagmus analysis has been limited by the scarcity of publicly available data, as eye movement patterns can be personally identifiable, raising privacy concerns. GenVOG-Dit addresses this challenge by generating high-fidelity synthetic nystagmus videos that emulate diverse clinical waveform types without relying on real patient data. Leveraging publicly accessible datasets and a transformer-enhanced diffusion process, the model produces extended, clinically meaningful video sequences. The utility of these synthetic videos is validated through performance on downstream tasks using real patient datasets, demonstrating the potential of GenVOG-Dit as a privacy-preserving solution for nystagmus research.
Primary Subject Area: Image Synthesis
Secondary Subject Area: Generative Models
Registration Requirement: Yes
Visa & Travel: Yes
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Submission Number: 183
Loading