An Augmentative and Alternative Communication Synthetic Corpus for Brazilian Portuguese

Published: 01 Jan 2023, Last Modified: 19 Feb 2025ICALT 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In recent years, Augmentative and Alternative Communication (AAC) systems have grown significantly in Brazil, particularly for individuals with cognitive disorders who rely on high-tech AAC tools. Artificial Intelligence (AI) has significantly improved high-tech AAC systems by enhancing accessibility, increasing output generation speed, and improving AAC interfaces' customization and adaptability. This study investigates the use of Large Language Models (LLMs) to generate synthetic text data to augment a corpus for AAC in Brazilian Portuguese. A three-step method was used to augment an initial corpus of 667 AAC-like sentences produced by specialists to a corpus of 13k sentences, comprising sentence collection, corpus augmentation using GPT-3 in a few-shot setting, and corpus cleaning. The quality and reliability of the generated corpus were assessed through a coverage analysis, comparing the content of the generated sentences with the original human-composed sentences. The results provide insights into the methods' strengths and limitations and inform future efforts to improve the generation of synthetic text data for the AAC domain in Brazilian Portuguese.
Loading