myNLP: Natural Language Processing Library for Myanmar Language

Anonymous

myNLP: Natural Language Processing Library for Myanmar Language

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: myNLP is a free, open-source natural language processing (NLP) library focused on the Myanmar language. The library is implemented in Python programming language and benchmarked on the available Myanmar corpora. In this paper, we provide outlines and comparisons of different approaches for each of the language processing functionalities as well as the datasets and pre-trained models. The library is constructed in a hierarchical structure including language processing functions and models for different NLP tasks. It will be publicly released and available on GitHub, with some larger models hosted on Hugging Face.

Paper Type: long

Research Area: NLP Applications

Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources

Languages Studied: Myanmar language

Preprint Status: There is no non-anonymous preprint and we do not intend to release one.

A1: yes

A1 Elaboration For Yes Or No: In section 5, Results and Discussion, we discussed about our NER model which has low precision, recall and f1-score. Data scarcity is one of the major problems for the low-resource languages like Myanmar language.

A2: n/a

A2 Elaboration For Yes Or No: Our paper is a benchmarking and system demonstrating paper. We cannot think of any potential risks of our work.

A3: yes

A3 Elaboration For Yes Or No: Abstract and Section 1 Introduction

B: yes

B1: yes

B1 Elaboration For Yes Or No: We add footnotes and cited all artifacts properly such as OpenNMT-tf in Section 2 Functionalities, 2.10 Machine Translation and tensorflow in Section 4 Methodologies.

B2: n/a

B3: n/a

B4: n/a

B4 Elaboration For Yes Or No: The datasets we applied are open-source datasets and most of the datasets were developed by ourselves for this research purpose responsibility.

B5: n/a

B6: yes

B6 Elaboration For Yes Or No: Section 3 Datasets, Table 1

C: yes

C1: yes

C1 Elaboration For Yes Or No: ur experiments primarily focused on low-resource datasets, for which the computational demands were modest. Given the nature of our experiments, we found that utilizing 1 or 2 GPUs was sufficient for GPT-2 based myPoetry experiments. However, for other experiments, particularly those involving larger datasets, we predominantly utilized CPU-based computing infrastructure.

C2: yes

C2 Elaboration For Yes Or No: Hyperparameters: Section 4: Methodologies Table 2

C3: yes

C3 Elaboration For Yes Or No: Section 5 Results and Discussion, Table 3, 4, 5, 6, 7. The experiments were done just a single run and the experimental logs and notebooks will be released with the library.

C4: yes

C4 Elaboration For Yes Or No: In Section 5: Results and Discussion

D: yes

D1: n/a

D1 Elaboration For Yes Or No: The datasets we collected and prepared for the experiments were done by the authors ourselves, who are also native speakers. The data sources for raw data are Social media websites. The dataset informations will be in Section 3: Datasets.

D2: n/a

D2 Elaboration For Yes Or No: The native speaker students volunteered to translate for the parallel corpora prepared to train our machine translation systems. The translated data were later checked by the language expert from myNLP team for each language.

D3: n/a

D4: n/a

D4 Elaboration For Yes Or No: They all agree themselves to help in research and development of their own languages since the native languages of Myanmar are low-resource languages.

D5: n/a

E: no

E1 Elaboration For Yes Or No: Still learning ...

0 Replies

Loading