Abstract: Recent advancements in large language models (LLMs) have remarkably enhanced performances on a variety of tasks in multiple languages.
However, tokenizers in LLMs trained primarily on English-centric corpora often overly fragment a text into character or Unicode-level tokens in non-Roman alphabetic languages, leading to inefficient text generation.
We introduce a simple yet effective framework to accelerate text generation in such languages.
Our approach involves employing a new language model head with a vocabulary set tailored to a specific target language for a pre-trained LLM.
This is followed by fine-tuning the new head while incorporating a verification step to ensure the model's performance is preserved.
We show that this targeted fine-tuning, while freezing other model parameters, effectively reduces token fragmentation for the target language.
Our extensive experiments demonstrate that the proposed framework increases the generation speed by a factor of 1.7 while maintaining the performance of pre-trained multilingual models on target monolingual tasks.
Paper Type: long
Research Area: Multilinguality and Language Diversity
Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency
Languages Studied: Korean; Japanese
Preprint Status: There is a non-anonymous preprint (URL specified in the next question).
A1: yes
A1 Elaboration For Yes Or No: 7
A2: no
A3: yes
A3 Elaboration For Yes Or No: 1
B: yes
B1: yes
B1 Elaboration For Yes Or No: References
B2: yes
B2 Elaboration For Yes Or No: 4. Experiment
B3: yes
B3 Elaboration For Yes Or No: 4. Experiment
B4: yes
B4 Elaboration For Yes Or No: 4. Experiment
B5: yes
B5 Elaboration For Yes Or No: 4. Experiment; Appendix
C: yes
C1: yes
C1 Elaboration For Yes Or No: 4. Experiment; 5. Analysis; Appendix
C2: yes
C2 Elaboration For Yes Or No: Appendix
C3: yes
C3 Elaboration For Yes Or No: 4. Experiment
C4: yes
C4 Elaboration For Yes Or No: 4. Experiment
D: no
D1: n/a
D2: n/a
D3: n/a
D4: n/a
D5: n/a
E: yes
E1: n/a
0 Replies
Loading