Abstract: Large-language models (LLMs) have exhibited great potential to assist chip designs and analysis. Recent research and efforts are mainly focusing on text-based tasks including general QA, debugging, design tool scripting, and so on. However, chip design and implementation workflow usually require a visual understanding of diagrams, flow charts, graphs, schematics, waveforms, etc, which demands the development of multimodality foundation models. In this paper, we propose ChipVQA, a benchmark designed to evaluate the capability of visual language models for chip design. ChipVQA includes 142 carefully designed and collected VQA questions covering five chip design disciplines: Digital Design, Analog Design, Architecture, Physical Design and Semiconductor Manufacturing. Unlike existing VQA benchmarks, ChipVQA questions are carefully designed by chip design experts and require indepth domain knowledge and reasoning to solve. We conduct comprehensive evaluations on both open-source and proprietary multimodal models that are greatly challenged by the benchmark suit. ChipVQA is available at https://github.com/phdyang007/chipvqa.
Loading