Juliet: Per-Sample Conditional Branching for Efficient Con- volutional Networks

TMLR Paper6598 Authors

21 Nov 2025 (modified: 03 Dec 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: We introduce Juliet, a dynamic, trie-augmented neural architecture that improves the efficiency of convolutional neural networks by routing each input through learned per-node branches while growing and pruning capacity on the fly. Each node pairs a lightweight sub-module with a transformer-based path selector trained end-to-end; growing and pruning based on exponential moving average (EMA) usage let the model expand or contract during training to preserve accuracy within compute and memory budgets. We graft Juliet onto ResNet-18, EfficientNet-B0, and DenseNet-121 and train on CIFAR-10 (ARCHER2), with an ImageNet/H100 check using ResNet-101. On CIFAR-10, Juliet reduces theoretical training and inference FLOPs, even when the parameter count increases. The results show a $\sim21\%$, (ResNet-18), $\sim68\%$ (EfficientNet-B0), and $\sim70\%$ (DenseNet-121) in inference flops, while staying within $\sim1\%$ Top-1 of the baseline for ResNet-18 and DenseNet-121, with a larger trade-off on EfficientNet-B0. At ImageNet scale, Juliet-101 achieves $27.1$ Top-1 per GFLOPs, outscoring SkipNet, ConvNet-AIG, and BlockDrop. Ablations and hyperparameter sweeps (growth/prune thresholds, prune interval, prebuild limit) reveal nuances in Juliet's architecture, and simpler routers (e.g., a small MLP) match transformer routing, indicating the transformer router may not be a prerequisite for achieving competitive accuracy. Overall, Juliet provides a flexible, interpretable approach to conditional computation for convolutional neural networks, improving the efficiency–accuracy trade-off for the CNNs we evaluate.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Omar_Rivasplata1
Submission Number: 6598
Loading