Keywords: Enterprise Knowledge, Automated Knowledge Base Construction, Generative modelling, Probabilistic programming
TL;DR: An automated system for incrementally extracting multi-typed entities from private enterprise documents including emails, calendar events and documents with users in the loop
Abstract: We present Enterprise Alexandria, one of the core AI technologies behind Microsoft Viva Topics. Enterprise Alexandria is a new system for automatically constructing a knowledge base with high-precision and typed entities from private enterprise data such as emails, documents and intranet pages. Built as an extension of Alexandria [Winn et al.,2019], the key novelty of Enterprise Alexandria is the ability in processing both the textual information and the structured metadata available in each document in an online learning fashion, making use of any manual curations that have happened in the interim. This task is performed entirely eyes-off to respect the privacy of the user and the restricted access their documents. The knowledge discovery process uses a probabilistic program defining the process of generating the data item from a set of unknown typed entities. Using probabilistic inference, Enterprise Alexandria can jointly discover a large set of entities with custom types specific to the organization. Experiments on three real-world datasets show that the system outperforms alternative methods with the ability to work effectively at large scale.
Subject Areas: Information Extraction, Machine Learning
Archival Status: Archival