Domain Specific Features Driven Information Extraction from Web Pages of Scientific Conferences

Piotr Andruszkiewicz, Rafal Hazan

2017 (modified: 12 Nov 2021)CICLing (1) 2017Readers: Everyone

Abstract: In this paper we describe information extraction from web pages of scientific conferences. We enrich already known features with our new features specific for this domain and show their importance in the process of extracting information. Moreover, we investigate various data representation models, e.g., based on single tokens or sequences, in order to find the best configuration for the task in question and set up a new baseline over publicly available corpus.

0 Replies