Abstract: Many data analysis and data integration applications need to account for multiple representations of entities. The variations in entity mentions arise in complex ways that are hard to capture using a textual similarity function. More sophisticated functions require the knowledge of underlying structure in the representation of entities. People traditionally identify these structures manually and write programs to manipulate them: such work is tedious and cumbersome. We have built LUSTRE, an active learning based system that can learn the structured representations of entities interactively from a few labels. In the background, it automatically generates programs to map entity mentions to their representations and to standardize them to a unique representation. Furthermore, LUSTRE provides a user-friendly interface to allow user declaratively specify normalization and variant generation functions for downstream applications.
0 Replies
Loading