Keywords: Activation function, deep learning, computer vision applications
Abstract: Activation functions govern how recurrent networks regulate and transmit information across temporal dependencies. Despite advances in sequence modelling, gated recurrent units (GRUs) still depend on the standard sigmoid and tanh nonlinearities, which can produce weak gate separation and unstable learning, particularly when training data are limited. We introduce squared sigmoid-tanh (SST), a parameterfree activation that squares the gate nonlinearity to increase contrast between near-zero- and high-activations, thereby
promoting sharper information filtering during GRU updates. We incorporate SST into GRU gating and evaluate it across low-data settings spanning sign language recognition, human activity recognition, and time-series forecasting and classification. Across tasks, SST-GRU consistently surpasses standard sigmoid/tanh GRU, with the largest improvements observed in the smallest-data domains, while adding negligible computational cost. We further examine gate activation statistics and training dynamics, showing that SST improves training stability, which aligns with its performance gains in data-scarce settings. SST is a parameter-free modification that complements more complex architectural advances by improving gating selectivity in low-data sequence learning.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 6
Loading