Automatic Entity Recognition and Typing in Massive Text Corpora

Xiang Ren, Ahmed El-Kishky, Chi Wang, Jiawei Han

2016 (modified: 12 Nov 2022)WWW (Companion Volume) 2016Readers: Everyone

Abstract: In today's computerized and information-based society, we are soaked with vast amounts of natural language text data, ranging from news articles, product reviews, advertisements, to a wide range of user-generated content from social media. To turn such massive unstructured text data into actionable knowledge, one of the grand challenges is to gain an understanding of entities and the relationships between them. In this tutorial, we introduce data-driven methods to recognize typed entities of interest in different kinds of text corpora (especially in massive, domain-specific text corpora). These methods can automatically identify token spans as entity mentions in text and label their types (e.g., people, product, food) in a scalable way. We demonstrate on real datasets including news articles and yelp reviews how these typed entities aid in knowledge discovery and management.

0 Replies