Keywords: LLMs, structured JSON, schema adherence, parseability, GRPO, multi-reward reinforcement learning, Group Relative Policy Optimization, schema-faithful generation, JSON validation, LoRA fine-tuning, Qwen models, small language models, open models, biomanufacturing, pharma, regulatory traceability, schema compliance, structured extraction, constrained decoding, auditable outputs, on-prem deployment, adjusted match, adjusted noise, parse success, synthetic data, semantic judge reward, reinforcement learning from feedback, lightweight training, model interpretability, JSON generation
Abstract: We present a method that teaches small and medium-sized language models to generate perfectly valid and schema-correct JSON without relying on slow grammar-based decoding. Using multi-reward reinforcement learning, the models learn to follow structural rules, match keys and values accurately, and stay consistent with human-defined schemas. Our approach works efficiently on limited hardware and produces auditable, parseable outputs suitable for real-world use in regulated industries like biomanufacturing and pharma.
Paper Type: Long
Research Area: Hierarchical Structure Prediction, Syntax, and Parsing
Research Area Keywords: quantization; pruning; distillation; parameter-efficient-training; data-efficient training; data augmentation; LLM Efficiency; NLP in resource-constrained settings;
Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency
Languages Studied: English;
Submission Number: 2143
Loading