Abstract: This paper addresses the issue of how to obtain processing tools for argument identification for the vast majority of the languages that, differently from English, have little to no relevant labeled data. This issue is addressed by taking an under-resourced language as a case study, namely Portuguese, and by experimenting with three techniques to cope with the scarceness of data: to obtain labelled data by machine translating data sets from another language labelled with respect to argument identification; to transfer to the argument identifier the language knowledge captured in distributional semantic models obtained during the resolution of other tasks for which more data exist; to expand data for argument identification with text augmenting techniques. The results obtained demonstrate that it is possible to develop argument identification tools for under-resourced languages with a level of performance that is competitive to the ones for languages with relevant language resources.
0 Replies
Loading