Keywords: Machine learning, Deep learning, Large Language Models, LLM, Fine-tuning, Gemma, Google, Languages
Abstract: The rise of Large Language Models has not been inclusive of all cultures. The models are mostly trained on English texts and culture which makes them underperform in other languages and cultural contexts. By developing a generalizable method for preparing culturally relevant datasets and post-training the Gemma 2 model, this project aims to increase the performance of Gemma 2 for a underrepresented language and showcase how others can do the same to unlock the power of Generative AI in their country and preserve their cultural heritage.
Submission Number: 35
Loading