Abstract: Diseases and their symptoms are a frequent information need for Web users. Diseases often are categorized into sub-types, manifested through different symptoms. Extracting such information from textual corpora is inherently difficult. Yet, this can be easily extracted from semi-structured resources like tables. We propose an approach for identifying tables that contain information about sub-type classifications and their attributes. Often tables have diverse and redundant schemas, hence, we align equivalent columns in disparate schemas s.t. information about diseases are accessible through a unified and a common schema. Experimental evaluation shows that we can accurately identify tables containing disease sub-type classifications and additionally align equivalent columns.
0 Replies
Loading