Figure and caption extraction from biomedical documentsDownload PDFOpen Website

2019 (modified: 05 Feb 2023)Bioinform. 2019Readers: Everyone
Abstract: Figures and captions convey essential information in biomedical documents. As such, there is a growing interest in mining published biomedical figures and in utilizing their respective captions as a source of knowledge. Notably, an essential step underlying such mining is the extraction of figures and captions from publications. While several PDF parsing tools that extract information from such documents are publicly available, they attempt to identify images by analyzing the PDF encoding and structure and the complex graphical objects embedded within. As such, they often incorrectly identify figures and captions in scientific publications, whose structure is often non-trivial. The extraction of figures, captions and figure-caption pairs from biomedical publications is thus neither well-studied nor yet well-addressed.
0 Replies

Loading