---
base_model: HuggingFaceH4/mistral-7b-sft-beta
tags:
- alignment-handbook
- generated_from_trainer
datasets:
- /home/user/ultrafeedback_fullsorted_fix
model-index:
- name: zephyr-NCA-reward
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-NCA-reward

This model is a fine-tuned version of [HuggingFaceH4/mistral-7b-sft-beta](https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta) on the /home/user/ultrafeedback_fullsorted_fix dataset.
It achieves the following results on the evaluation set:
- Loss: 1.3007
- Loss/mini Gap Loss: 1.3007
- Loss/ori Loss: 1.3007
- Loss/reward Entrophy: 0.0
- Regularization/forward Kl: 0.5698
- Regularization/reverse Kl: 0.4143
- Regularization/policy Data Loss: 1.6956
- Regularization/reference Data Loss: 1.2661
- Regularization/policy Ref Data Loss Gap: 0.4295
- Mask/mask Ratio: 0.4577
- Reward/reward A0: -0.0038
- Reward/reward A1: -0.1788
- Reward/reward A2: -0.3592
- Reward/reward A3: -0.6457
- Rewards/chosen: -0.0038
- Rewards/rejected: -0.3945
- Rewards/margins: 0.3908
- Reward/a01 Acc: 0.6449
- Reward/a02 Acc: 0.7396
- Reward/a03 Acc: 0.8344
- Rewards/accuracies: 0.7396

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- total_eval_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Validation Loss | Loss/mini Gap Loss | Loss/ori Loss | Loss/reward Entrophy | Regularization/forward Kl | Regularization/reverse Kl | Regularization/policy Data Loss | Regularization/reference Data Loss | Regularization/policy Ref Data Loss Gap | Mask/mask Ratio | Reward/reward A0 | Reward/reward A1 | Reward/reward A2 | Reward/reward A3 | Rewards/chosen | Rewards/rejected | Rewards/margins | Reward/a01 Acc | Reward/a02 Acc | Reward/a03 Acc | Rewards/accuracies |
|:-------------:|:-----:|:----:|:---------------:|:------------------:|:-------------:|:--------------------:|:-------------------------:|:-------------------------:|:-------------------------------:|:----------------------------------:|:---------------------------------------:|:---------------:|:----------------:|:----------------:|:----------------:|:----------------:|:--------------:|:----------------:|:---------------:|:--------------:|:--------------:|:--------------:|:------------------:|
| 1.3845        | 0.05  | 100  | 1.3843          | 1.3843             | 1.3843        | 0.0                  | 0.0006                    | 0.0006                    | 1.2682                          | 1.2661                             | 0.0022                                  | 0.4577          | 0.0030           | -0.0001          | -0.0023          | -0.0049          | 0.0030         | -0.0024          | 0.0054          | 0.5932         | 0.6579         | 0.7117         | 0.6542             |
| 1.3641        | 0.11  | 200  | 1.3632          | 1.3632             | 1.3632        | 0.0                  | 0.0688                    | 0.0617                    | 1.3653                          | 1.2661                             | 0.0992                                  | 0.4577          | -0.0453          | -0.0905          | -0.1223          | -0.1596          | -0.0453        | -0.1241          | 0.0788          | 0.6082         | 0.6791         | 0.7396         | 0.6756             |
| 1.3464        | 0.16  | 300  | 1.3430          | 1.3430             | 1.3430        | 0.0                  | 0.2320                    | 0.1950                    | 1.3931                          | 1.2661                             | 0.1270                                  | 0.4577          | -0.0499          | -0.1410          | -0.2129          | -0.3031          | -0.0499        | -0.2190          | 0.1691          | 0.6304         | 0.6988         | 0.7671         | 0.6988             |
| 1.3387        | 0.21  | 400  | 1.3285          | 1.3285             | 1.3285        | 0.0                  | 0.4617                    | 0.3766                    | 1.4589                          | 1.2661                             | 0.1928                                  | 0.4577          | -0.0167          | -0.1373          | -0.2414          | -0.3912          | -0.0167        | -0.2566          | 0.2399          | 0.6356         | 0.7076         | 0.7930         | 0.7120             |
| 1.3309        | 0.27  | 500  | 1.3204          | 1.3204             | 1.3204        | 0.0                  | 0.4646                    | 0.3825                    | 1.4782                          | 1.2661                             | 0.2121                                  | 0.4577          | -0.0003          | -0.1341          | -0.2534          | -0.4304          | -0.0003        | -0.2727          | 0.2723          | 0.6372         | 0.7107         | 0.8100         | 0.7193             |
| 1.325         | 0.32  | 600  | 1.3164          | 1.3164             | 1.3164        | 0.0                  | 0.5434                    | 0.4317                    | 1.5453                          | 1.2661                             | 0.2792                                  | 0.4577          | -0.0366          | -0.1874          | -0.3337          | -0.5403          | -0.0366        | -0.3538          | 0.3172          | 0.6335         | 0.7205         | 0.8100         | 0.7214             |
| 1.3311        | 0.37  | 700  | 1.3122          | 1.3122             | 1.3122        | 0.0                  | 0.5382                    | 0.4264                    | 1.5599                          | 1.2661                             | 0.2938                                  | 0.4577          | -0.0042          | -0.1527          | -0.2999          | -0.5274          | -0.0042        | -0.3267          | 0.3224          | 0.6413         | 0.7200         | 0.8245         | 0.7286             |
| 1.3112        | 0.42  | 800  | 1.3086          | 1.3086             | 1.3086        | 0.0                  | 0.5743                    | 0.4255                    | 1.6721                          | 1.2661                             | 0.4060                                  | 0.4577          | -0.0112          | -0.1685          | -0.3250          | -0.5754          | -0.0112        | -0.3563          | 0.3451          | 0.6449         | 0.7334         | 0.8287         | 0.7357             |
| 1.3156        | 0.48  | 900  | 1.3082          | 1.3082             | 1.3082        | 0.0                  | 0.5717                    | 0.4240                    | 1.6341                          | 1.2661                             | 0.3680                                  | 0.4577          | -0.0214          | -0.1861          | -0.3578          | -0.6112          | -0.0214        | -0.3850          | 0.3637          | 0.6460         | 0.7360         | 0.8261         | 0.7360             |
| 1.3131        | 0.53  | 1000 | 1.3066          | 1.3066             | 1.3066        | 0.0                  | 0.5842                    | 0.4200                    | 1.7286                          | 1.2661                             | 0.4626                                  | 0.4577          | -0.0454          | -0.2257          | -0.4053          | -0.6707          | -0.0454        | -0.4339          | 0.3885          | 0.6506         | 0.7422         | 0.8328         | 0.7419             |
| 1.3092        | 0.58  | 1100 | 1.3040          | 1.3040             | 1.3040        | 0.0                  | 0.5668                    | 0.4164                    | 1.6753                          | 1.2661                             | 0.4092                                  | 0.4577          | -0.0194          | -0.1939          | -0.3686          | -0.6412          | -0.0194        | -0.4012          | 0.3818          | 0.6460         | 0.7428         | 0.8349         | 0.7412             |
| 1.3097        | 0.64  | 1200 | 1.3027          | 1.3028             | 1.3028        | 0.0                  | 0.5639                    | 0.4199                    | 1.6401                          | 1.2661                             | 0.3740                                  | 0.4577          | -0.0002          | -0.1708          | -0.3436          | -0.6201          | -0.0002        | -0.3782          | 0.3780          | 0.6444         | 0.7422         | 0.8395         | 0.7421             |
| 1.2929        | 0.69  | 1300 | 1.3019          | 1.3019             | 1.3019        | 0.0                  | 0.5674                    | 0.4188                    | 1.6644                          | 1.2661                             | 0.3983                                  | 0.4577          | -0.0039          | -0.1761          | -0.3536          | -0.6335          | -0.0039        | -0.3877          | 0.3838          | 0.6470         | 0.7417         | 0.8354         | 0.7414             |
| 1.3107        | 0.74  | 1400 | 1.3017          | 1.3017             | 1.3017        | 0.0                  | 0.5596                    | 0.4140                    | 1.6506                          | 1.2661                             | 0.3845                                  | 0.4577          | 0.0060           | -0.1611          | -0.3364          | -0.6151          | 0.0060         | -0.3708          | 0.3768          | 0.6444         | 0.7422         | 0.8333         | 0.7400             |
| 1.296         | 0.8   | 1500 | 1.3013          | 1.3013             | 1.3013        | 0.0                  | 0.5751                    | 0.4164                    | 1.7004                          | 1.2661                             | 0.4343                                  | 0.4577          | -0.0053          | -0.1799          | -0.3600          | -0.6481          | -0.0053        | -0.3960          | 0.3907          | 0.6465         | 0.7422         | 0.8349         | 0.7412             |
| 1.304         | 0.85  | 1600 | 1.3007          | 1.3007             | 1.3007        | 0.0                  | 0.5724                    | 0.4169                    | 1.6883                          | 1.2661                             | 0.4222                                  | 0.4577          | -0.0015          | -0.1760          | -0.3549          | -0.6421          | -0.0015        | -0.3910          | 0.3895          | 0.6434         | 0.7407         | 0.8370         | 0.7403             |
| 1.3101        | 0.9   | 1700 | 1.3006          | 1.3006             | 1.3006        | 0.0                  | 0.5671                    | 0.4145                    | 1.6800                          | 1.2661                             | 0.4139                                  | 0.4577          | 0.0013           | -0.1716          | -0.3500          | -0.6354          | 0.0013         | -0.3857          | 0.3870          | 0.6423         | 0.7396         | 0.8359         | 0.7393             |
| 1.2987        | 0.96  | 1800 | 1.3007          | 1.3008             | 1.3008        | 0.0                  | 0.5698                    | 0.4143                    | 1.6954                          | 1.2661                             | 0.4293                                  | 0.4577          | -0.0038          | -0.1785          | -0.3590          | -0.6456          | -0.0038        | -0.3944          | 0.3906          | 0.6449         | 0.7391         | 0.8349         | 0.7396             |


### Framework versions

- Transformers 4.35.0
- Pytorch 2.0.1+cu117
- Datasets 2.14.6
- Tokenizers 0.14.1
