Automatic Web Information Extraction in the ROADRUNNER System

Valter Crescenzi, Giansalvatore Mecca, Paolo Merialdo

Published: 2001, Last Modified: 04 Oct 2025ER (Workshops) 2001EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper presents Road Runner, a research project that aims at developing solutions for automatically extracting data from large HTML data sources. The target of our research are data-intensive Web sites, i.e., HTML-based sites with a fairly complex structure, that publish large amounts of data. The paper describes the top-level software architecture of the Road Runner System, and the novel research challenges posed by the attempt to automate the information extraction process.