Abstract: Voice agents built on ASR-LLM-TTS pipelines allocate compute statically. It wastes resources on simple queries but does not cater to complex ones. So, we created PAVO (Pipeline-Aware Voice Orchestrator), which routes each turn through a three-stage pipeline. We orchestrate these routes based on demand signals extracted before the transcription even begins. We noticed that ASR errors propagate to downstream LLMs in two distinct regimes. One of them was a sharp factual accuracy cliff and the other one was gradual semantic degradation. This resulted in creating inter stage coupling constraints that prior routing systems ignore. We validated this structure on n=5,430 direct calibration measurements across two hardware platforms (H100, M3) and three LLM families (Llama 3.1 8B, Mistral 7B, Gemma2 2B). We also enforced these constraints via hard logit masking in an 85K parameter RL trained meta controller which reduced coherence failures by 7.9x. It achieved 34% lower median latency and 71% lower energy when compared to rigid cloud baselines on a 50K-turn simulated benchmark. We also noted that direct H100 experiments on 200 LibriSpeech samples confirmed 10.3% P95 tail compression (p = 2x10^-6). Code and data are publicly available.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Tatiana_Likhomanenko1
Submission Number: 8587
Loading