Tool: Automatically Extracting Hardware Descriptions from PDF Technical Documentation

JSYS 2023 May Papers Submission2 Authors

29 Apr 2023 (modified: 05 May 2023)JSYS 2023 May Papers SubmissionEveryoneRevisions
Keywords: hardware-dependent software, technical documentation, knowledge graph, code generation, open source
TL;DR: The paper describes a tool that implements a modular processor for extracting detailed data sets from technical documentation using deterministic table processing for thousands of microcontrollers.
Abstract: The ever-increasing selection of microcontrollers brings the challenge of porting embedded software to new devices through much manual work, while code generators are used only in special cases. Since, in practice, usable data is limited to machine-readable formats and the substantial amount of technical documentation is difficult to access due to the print-oriented nature of PDF, we identify the need for a processor to access the PDF and extract data with a high quality to enable more code generation of embedded software. In this paper, we design and implement a modular processor for extracting detailed data sets from technical documentation using deterministic table processing for thousands of microcontrollers: device identifiers, interrupt tables, package and pinouts, pin functions, and register maps. Our evaluation of STMicro documentation compares the completeness and correctness of these data sets against existing machine-readable sources with a weighted average of 96.5% across almost 6 million data points while also finding several issues in both sources. We show that our tool yields very accurate data with only limited manual effort and can enable and enhance a significant amount of existing and new code generation use cases in the embedded software domain that are currently limited by a lack of machine-readable data sources.
Area: Computer Architecture
Type: Tool/benchmark
Revision: No
Submission Number: 2
Loading