Font-Agent: Enhancing Font Understanding with Large Language Models

Yingxin Lai, Xu Cuijie, Haitian Shi, Guoqing Yang, Xiaoning Li, Zhiming Luo, Shaozi Li

Published: 03 Mar 2025, Last Modified: 02 Oct 2025Proceedings of the Computer Vision and Pattern Recognition ConferenceEveryoneCC BY 4.0

Abstract: The rapid development of generative models has significantly advanced font generation. However, limited exploration has been devoted to the evaluation and interpretability of graphical fonts. Existing quality assessment models can only provide basic visual analyses, such as recognizing clarity and brightness, without in-depth explanations. To address these limitations, we first constructed a large-scale multimodal dataset named the Diversity Font Dataset (DFD), comprising 135,000 font-text pairs. This dataset encompasses a wide range of generated font types and annotations, including language descriptions and quality assessments, thus providing a robust foundation for training and evaluating font analysis models. Based on this dataset, we developed a font agent built upon a Vision-Language Model (VLM) aiming to enhance font quality assessment and offer interpretable question-answering capabilities. Alongside the original visual encoder in VLM, we integrated an Edge-Aware Traces (EAT) module to capture detailed edge information of font strokes and components. Furthermore, we introduced a Dynamic Direct Preference Optimization (D-DPO) strategy to facilitate efficient model fine-tuning. Experimental results demonstrate that Font-Agent achieves state-of-the-art performance on the established dataset. To further evaluate the generalization ability of our algorithm, we conducted additional experiments on several public datasets. The results highlight the notable advantage of Font-Agent in both assessing the quality of generated fonts and comprehending their content.