EXPLOITING LATENT INFORMATION IN RELATIONAL DATABASES VIA WORD EMBEDDING

Rajesh Bordawekar, Oded Shmueli, Bortik Bandyopadhyay

Feb 12, 2018 (modified: Jun 04, 2018) ICLR 2018 Workshop Submission readers: everyone Show Bibtex
  • Abstract: We propose Cognitive Databases, an approach for transparently enabling Artificial Intelligence (AI) capabilities in relational databases. A novel aspect of our design is to first view the structured data source as meaningful unstructured text, and then use the text to build a word embedding model. This model captures the hidden inter-/intra-column relationships between database tokens of different types such as numeric values, SQL Dates, and even images. For each database token, the model includes a vector that encodes contextual semantic relationships. We seamlessly integrate the word embedding model into existing SQL query infrastructure and use it to enable a new class of SQL-based analytics queries called cognitive intelligence (CI) queries. CI queries use the model vectors to enable complex queries such as semantic similarity/dissimilarity, inductive reasoning queries such as analogies and semantic clustering, predictive queries using entities not present in a database, and, more generally, using knowledge from external sources.
  • Keywords: Word Embedding, Relational Databases, Novel SQL queries
  • TL;DR: Describes an approach to capture hidden information from relational database using word embedding and using in SQL queries.

Loading