Abstract: myNLP is a free, open-source natural language processing (NLP) library focused on the Myanmar language. The library is implemented in Python programming language and benchmarked on the available Myanmar corpora. In this paper, we provide outlines and comparisons of different approaches for each of the language processing functionalities as well as the datasets and pre-trained models. The library is constructed in a hierarchical structure including language processing functions and models for different NLP tasks. It will be publicly released and available on GitHub, with some larger models hosted on Hugging Face.
Paper Type: long
Research Area: NLP Applications
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources
Languages Studied: Myanmar language
Preprint Status: There is no non-anonymous preprint and we do not intend to release one.
A1: yes
A1 Elaboration For Yes Or No: In section 5, Results and Discussion, we discussed about our NER model which has low precision, recall and f1-score. Data scarcity is one of the major problems for the low-resource languages like Myanmar language.
A2: n/a
A2 Elaboration For Yes Or No: Our paper is a benchmarking and system demonstrating paper. We cannot think of any potential risks of our work.
A3: yes
A3 Elaboration For Yes Or No: Abstract and Section 1 Introduction
B: yes
B1: yes
B1 Elaboration For Yes Or No: We add footnotes and cited all artifacts properly such as OpenNMT-tf in Section 2 Functionalities, 2.10 Machine Translation and tensorflow in Section 4 Methodologies.
B2: n/a
B3: n/a
B4: n/a
B4 Elaboration For Yes Or No: The datasets we applied are open-source datasets and most of the datasets were developed by ourselves for this research purpose responsibility.
B5: n/a
B6: yes
B6 Elaboration For Yes Or No: Section 3 Datasets, Table 1
C: yes
C1: yes
C1 Elaboration For Yes Or No: ur experiments primarily focused on low-resource datasets, for which the computational demands were modest. Given the nature of our experiments, we found that utilizing 1 or 2 GPUs was sufficient for GPT-2 based myPoetry experiments. However, for other experiments, particularly those involving larger datasets, we predominantly utilized CPU-based computing infrastructure.
C2: yes
C2 Elaboration For Yes Or No: Hyperparameters: Section 4: Methodologies Table 2
C3: yes
C3 Elaboration For Yes Or No: Section 5 Results and Discussion, Table 3, 4, 5, 6, 7. The experiments were done just a single run and the experimental logs and notebooks will be released with the library.
C4: yes
C4 Elaboration For Yes Or No: In Section 5: Results and Discussion
D: yes
D1: n/a
D1 Elaboration For Yes Or No: The datasets we collected and prepared for the experiments were done by the authors ourselves, who are also native speakers. The data sources for raw data are Social media websites. The dataset informations will be in Section 3: Datasets.
D2: n/a
D2 Elaboration For Yes Or No: The native speaker students volunteered to translate for the parallel corpora prepared to train our machine translation systems. The translated data were later checked by the language expert from myNLP team for each language.
D3: n/a
D4: n/a
D4 Elaboration For Yes Or No: They all agree themselves to help in research and development of their own languages since the native languages of Myanmar are low-resource languages.
D5: n/a
E: no
E1 Elaboration For Yes Or No: Still learning ...
0 Replies
Loading