Fine-Tune an SLM or Prompt an LLM? The Case of Generating Low-Code Workflows

Orlando Marquez Ayala; Patrice Bechard; Emily Chen; Maggie Baird; Jingfei Chen

Fine-Tune an SLM or Prompt an LLM? The Case of Generating Low-Code Workflows

Orlando Marquez Ayala, Patrice Bechard, Emily Chen, Maggie Baird, Jingfei Chen

Published: 04 Jul 2025, Last Modified: 04 Aug 2025KDD 2025 Workshop SKnow-LLM PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Generative AI, Retrieval-Augmented Generation, Task Decomposition, Workflows

TL;DR: We present evidence that, in a domain-specific task such as low-code workflow generation, a fine-tuned SLM performs better than prompted LLMs

Abstract: Large Language Models (LLMs) such as GPT-4o can handle a wide range of complex tasks with the right prompt. As per token costs are reduced, the advantages of fine-tuning Small Language Models (SLMs) for real-world applications --- faster inference, lower costs --- may no longer be clear. In this work, we present evidence that, for domain-specific tasks that require structured outputs, SLMs still have a quality advantage. We compare fine-tuning an SLM against prompting LLMs on the task of generating low-code workflows in JSON form. We observe that while a good prompt can yield reasonable results, fine-tuning improves quality by 10% on average. We also perform systematic error analysis to reveal model limitations.

Submission Number: 30

Loading