A Preliminary Evaluation of Large Language Models for Data Science Code Generation

Farshad Ghorbanishovaneh; Lars Kotthoff

A Preliminary Evaluation of Large Language Models for Data Science Code Generation

Farshad Ghorbanishovaneh, Lars Kotthoff

Published: 29 Aug 2025, Last Modified: 04 Sept 2025AutoML 2025 Non-Archival Content TrackEveryoneRevisionsBibTeXCC BY 4.0

Submission Type: Short paper

Tldr: We evaluate how well large language models can autonomously generate complete, executable machine learning pipelines for a real-world classification task.

Submission Number: 10

Loading