Safurai 001: New Qualitative Approach for Evaluation

Leonardo Boiardi; Davide Cifarelli; Alessandro Puppo

Safurai 001: New Qualitative Approach for Evaluation

Leonardo Boiardi, Davide Cifarelli, Alessandro Puppo

19 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeX

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: AI, LLM, Evaluation Metrics, Coding Assistance

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: An exploration of Safurai-001, a more conversational LLM for coding assistance, and the introduction of GPT4-based MultiParamaters Evaluation Benchmark.

Abstract: This paper presents Safurai-001, a new Large Language Model (LLM) with significant potential in the domain of coding assistance. Driven by recent advancements in coding LLMs, Safurai-001 competes in performance with the latest models like WizardCoder(1), PanguCoder(2) and Phi-1(3) but aims to deliver a more ”conversational” interaction. By capitalizing on the progress in data engineering (latest techniques of data transformation and prompt engineering) and instruction tuning, this new model promises to stand toe-to-toe with recent closed and open source developments. Recognizing the need for an efficacious evaluation metric for coding LLMs, this paper also introduces GPT4-based MultiParameters: an evaluation benchmark that harnesses varied parameters to present a comprehensive insight into the model’s functioning and performance. Our assessment shows that Safurai-001 can outperform GPT-3.51 by 1.58% and WizardCoder by 18.78% in Code Readability parameter and more.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1744

Loading