### Will be exploring training on the raw dataset of Halawi using Hugginface TRL library GRPO. 

Training scripts are currently broken. 