Abstract: As databases of lexical information on words and their lexical relationships, WordNets are important for various downstream natural language processing applications. However, the construction of WordNets can be challenging, especially for low-resource languages such as Filipino. The existing Filipino WordNet has not been maintained, and lacks contextual information for identifying the evolution of word senses. In this study, we built a corpus of 5,370,667 unique tokens and used it to construct a Filipino WordNet via a two-way approach that combines natural language processing and network science. For the natural language processing approach, we utilized only two linguistic sources: our corpus and a RoBERTa-based language model that generates sentence embeddings. For the network science approach, we created a temporal-multiplex network that represents the co-occurrence of words, their semantic relationships, and their usage in different sources across time. We show that our proposed method can induce existing senses (30% of our validation data, as evaluated by matching with the senses from Princeton WordNet) and generate 9,549 semantic sets.
Loading