Toward best-effort information extraction

Warren Shen; Pedro DeRose; Robert McCann; AnHai Doan; Raghu Ramakrishnan

Toward best-effort information extraction

Warren Shen, Pedro DeRose, Robert McCann, AnHai Doan, Raghu Ramakrishnan

Published: 01 Jan 2008, Last Modified: 18 Jul 2024SIGMOD Conference 2008EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Current approaches to develop information extraction (IE) programs have largely focused on producing precise IE results. As such, they suffer from three major limitations. First, it is often difficult to execute partially specified IE programs and obtain meaningful results, thereby producing a long "debug loop". Second, it often takes a long time before we can obtain the first meaningful result (by finishing and running a precise IE program), thereby rendering these approaches impractical for time-sensitive IE applications. Finally, by trying to write precise IE programs we may also waste a significant amount of effort, because an approximate result -- one that can be produced quickly -- may already be satisfactory in many IE settings.To address these limitations, we propose iFlex, an IE approach that relaxes the precise IE requirement to enable best-effort IE. In iFlex, a developer U uses a declarative language to quickly write an initial approximate IE program P with a possible-worlds semantics. Then iFlex evaluates P using an approximate query processor to quickly extract an approximate result. Next, U examines the result, and further refines P if necessary, to obtain increasingly more precise results. To refine P, U can enlist a next-effort assistant, which suggests refinements based on the data and the current version of P. Extensive experiments on real-world domains demonstrate the utility of the iFlex approach.

Loading