Towards the Creation of the Filipino Wordnet: A Two-Way Approach

Briane Paul Samson; Charibeth Cheng; Unisse C. Chua; Dan John Velasco; Axel Alba; Trisha Gail Pelagio; Bryce Anthony Ramirez; Robi Jeanne Bangonon; Christine Deticio; Sharmaine Gaw; Danielle Kirsten Sison; Criscela Ysabelle Racelis; James Kevin Lin; Mark Edward M. Gonzales; Phoebe Clare Ong

Towards the Creation of the Filipino Wordnet: A Two-Way Approach

Briane Paul Samson, Charibeth Cheng, Unisse C. Chua, Dan John Velasco, Axel Alba, Trisha Gail Pelagio, Bryce Anthony Ramirez, Robi Jeanne Bangonon, Christine Deticio, Sharmaine Gaw, Danielle Kirsten Sison, Criscela Ysabelle Racelis, James Kevin Lin, Mark Edward M. Gonzales, Phoebe Clare Ong

Published: 01 Jan 2023, Last Modified: 19 Feb 2025IALP 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: As databases of lexical information on words and their lexical relationships, WordNets are important for various downstream natural language processing applications. However, the construction of WordNets can be challenging, especially for low-resource languages such as Filipino. The existing Filipino WordNet has not been maintained, and lacks contextual information for identifying the evolution of word senses. In this study, we built a corpus of 5,370,667 unique tokens and used it to construct a Filipino WordNet via a two-way approach that combines natural language processing and network science. For the natural language processing approach, we utilized only two linguistic sources: our corpus and a RoBERTa-based language model that generates sentence embeddings. For the network science approach, we created a temporal-multiplex network that represents the co-occurrence of words, their semantic relationships, and their usage in different sources across time. We show that our proposed method can induce existing senses (30% of our validation data, as evaluated by matching with the senses from Princeton WordNet) and generate 9,549 semantic sets.

Loading