FlowHN: Adaptive Token Routing for Efficient Parallel Hybrid Networks

Mohammad Mahdi Moradi; Walid Ahmed; Shuangyue Wen; Sudhir Mudur; Weiwei Zhang; Yang Liu

FlowHN: Adaptive Token Routing for Efficient Parallel Hybrid Networks

Mohammad Mahdi Moradi, Walid Ahmed, Shuangyue Wen, Sudhir Mudur, Weiwei Zhang, Yang Liu

Published: 18 Apr 2026, Last Modified: 25 Apr 2026ACL 2026 Industry Track PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Deployment-Efficient LLMs, Throughput Optimization, Systems-Efficient NLP, Model FLOP Utilization

Abstract: Production LLMs must balance modeling quality with predictable latency, stable accelerator utilization, and cost-efficient scaling—constraints that remain difficult for existing architectures. Transformers provide strong reasoning but incur quadratic complexity, while state-space models (SSMs) scale efficiently yet lack fine-grained interactions; prior hybrids either introduce sequential bottlenecks or rely on learned routing that complicates deployment. We present FlowHN, a deployment-oriented parallel hybrid architecture that enables deterministic conditional computation via FLOP-aware token circulation across attention and SSM branches. Instead of dynamic expert routing, FlowHN performs hardware-aligned token scheduling that balances workloads, reduces synchronization stalls, and preserves full parameter utilization. Across 135M–1B models, FlowHN achieves up to 4× higher throughput and 15% higher MFU than strong Transformer, SSM, and hybrid baselines while maintaining competitive accuracy on reasoning, coding, and long-context tasks up to 32K tokens. FlowHN is designed to integrate directly into existing Hybrid pipelines without changes to optimizers, training stacks, or inference serving infrastructure, making it practical for real-world deployment.

Submission Type: Emerging

Copyright Form: pdf

Submission Number: 285

Loading