Abstract: Highlights•TD3Net covers a wide and dense receptive field without blind spots.•Continuity loss in lip motion caused by blind spots is avoided via adaptive dilation.•Learns multi-temporal representations effectively across temporal modeling layers.•Achieves high accuracy with fewer parameters and FLOPs than previous TCN models.•Delivers significant performance improvements in word-level lipreading.
Loading