PKU
RLHF
dataset
datasets
tokenize
tokenizer
tokenizers
tokenized
pre
len
bool
boolean
str
args
kwargs
attr
AutoModelForScore
AutoModelForCausalLM
os
PathLike
config
unoptimized
LongTensor
BoolTensor
finetune
finetuning
backend
reproducibility
logits
bigbench
isort
tf
rl
rollout
ptx
gae
chatbot
ppo
sft
subclass
subclasses
Anthropic
fnlp
normalizer
dtype
dpo
DPO
