These datasets are generated based on PKU-SafeRLHF:
pku-positive-1w.json contains 10,000 questions with positive responses.
pku-negative-1w.json contains 10,000 questions with negative responses.
pku-full-2w.json contains 20,000 questions, including 10,000 questions with positive responses and 10,000 questions with negative responses.
pku-dpo-1w.jsonl contains 10,000 questions for DPO (Direct Preference Optimization), each with paired positive and negative responses.
