StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

Alex Zhuang; Ge Zhang; Tianyu Zheng; Xinrun Du; Junjie Wang; Weiming Ren; Wenhao Huang; Jie Fu; Xiang Yue; Wenhu Chen

StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

Alex Zhuang, Ge Zhang, Tianyu Zheng, Xinrun Du, Junjie Wang, Weiming Ren, Wenhao Huang, Jie Fu, Xiang Yue, Wenhu Chen

Published: 10 Jul 2024, Last Modified: 26 Aug 2024COLMEveryoneRevisionsBibTeXCC BY 4.0

Research Area: LMs and interactions, LMs with tools and code, LMs on diverse modalities and novel applications

Keywords: Structured Data；Instruction Tuning Dataset；Model Scalability Analysis

TL;DR: Our CodeLlama-based models, bolstered by a dataset of 1.1 million examples, set new standards in structured knowledge tasks, showing that model size has a minimal effect on enhancing these skills.

Abstract: Structured data sources, such as tables, graphs, and databases, are ubiquitous knowledge sources. Despite the demonstrated capabilities of large language models (LLMs) on plain text, their proficiency in interpreting and utilizing structured data remains limited. Our investigation reveals a notable deficiency in LLMs' ability to process structured data, e.g., ChatGPT lags behind state-of-the-art (SoTA) model by an average of 35\%. To augment the Structured Knowledge Grounding (SKG) capabilities in LLMs, we have developed a comprehensive instruction tuning dataset comprising 1.1 million examples. Utilizing this dataset, we train a series of models, referred to as $\texttt{structlm}$, based on Mistral and the CodeLlama model family, ranging from 7B to 34B parameters. Our $\texttt{structlm}$ series surpasses task-specific models~\citep{UnifiedSKG2022} on 16 out of 18 evaluated datasets and establishes new SoTA performance on 8 SKG tasks. Furthermore, $\texttt{structlm}$ demonstrates strong generalization across 6 novel held-out SKG tasks, outperforming TableLlama by an average of 35\% and Flan-UL2 20B by an average of 10\%. Contrary to expectations, we observe that scaling model size offers marginal benefits, with $\texttt{structlm}$-34B showing only slight improvements over $\texttt{structlm}$-7B. This suggests that structured knowledge grounding is still a challenging task and requires more innovative design to push to a new level. We release the model weights and training dataset to the community, along with relevant code on Github.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html

Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html

Submission Number: 785

Loading