Abstract: This study addresses the critical gap in Large Language Model (LLM) evaluation for business intelligence by conducting a rigorous comparative analysis of Llama3-70b-8192 and BambooLLM across five
key data analysis tasks. Utilizing the AdventureWorks Cycle dataset, we developed a comprehensive
evaluation framework measuring task efficiency, weighted accuracy, and misinterpretation rates. Results
demonstrate that Llama3-70b-8192 outperforms BambooLLM with a 40% lower misinterpretation rate
and 25% higher task efficiency across structured and interpretive business intelligence challenges. This
study highlights the potential for optimizing fine-tuning strategies for task items that combine structured
and interpretive elements, offering valuable insights for optimizing fine-tuning strategies and informing
future research directions in LLM evaluation for business intelligence applications.
Loading