Here is code for training expert models (primarily based on OpenRLHF with modifications to the SFT Trainer logic).