All original LRC settings are located in ReadMe_lrc_codebase.md, the modelling_lrc folder, and hf_trainer_lrc.py.

Please note that our training and testing environments are as follows:
The training environment is identical to LRC, while the testing environment has been updated.

### 🟢 RED Training Environment

```bash
conda create -n lrc python=3.10 -y
conda activate lrc

# Install PyTorch
pip install torch==2.3.0

# Install core libraries with strict versioning for training
pip install transformers[torch]==4.41.2 
pip install deepspeed==0.15.4 
pip install accelerate==1.1.1 
pip install datasets==2.19.2 
pip install datatrove==0.3.0 fire matplotlib seaborn wandb

# Install Flash Attention for optimized performance
MAX_JOBS=8 pip install flash-attn --no-build-isolation
```

### 🟡 `lm_eval` Environment

<details>
<summary>Why a separate environment for lm_eval?</summary>
The `lm_eval` library often requires a newer `transformers` version than what is compatible with the `deepspeed` setup used for LRC training. To avoid dependency conflicts, we recommend using a separate environment for evaluation.
</details>

```bash
conda create -n lm_eval --clone lrc
conda activate lm_eval

# Upgrade transformers for lm_eval compatibility
pip install transformers==4.53.3
pip install lm_eval==0.4.9.2 

https://github.com/EleutherAI/lm-evaluation-harness.git
```

Gain important dimensions, execute：
python llm_trim.py

The hf_trainer.py script in the RED series has incorporated an initialization call for RMSNorm. Activation-aware initialization has been added to the corresponding modelling files, with the values requiring initialization already incorporated.

First, run `generate_general_data_parallel.py` in the `data` directory to create the distilled dataset.

Then train the RED-1.5B teacher model: `meta-llama/Llama-3.2-3B-Instruct`.

accelerate launch --main_process_port 12231 --config_file "configs/accel_ds_8h800_gas1.yaml" hf_trainer.py \
  --log_steps 100 \
  --max_grad_norm 1.0 \
  --learning-rate 1e-4 \
  --gradient_accumulation_steps 1 \
  --max_steps 208000 \
  --dataset_name ./datasets/mix_general_llama3_tokenized_v5.1/train.jsonl \
  --batch-size 3 \
  --data-max-len 2048 \
  --save_steps 20000 \
  --check_data_cls_loss False \
  --target_hidden_size 1536 \
  --kl_temperature 1 \
  --warmup-ratio 0.005 \
  --raw-model-name /path/to/your/TEACHER_MODEL \
  --extra_tags general_train,8h800,arch,try_sota,all_ffn,all_attn \
  --use_accelerate True \
  --output_dir ./ckpts \
  --str_ban_losses no \
  --tie_word_emb_proj 1 \
  --use_all_attn 1 \
  --aux_loss_scale_factor 0.2

After training, convert it to Hugging Face format and test it on llm-eval.

python convert_ckpt.py \
  --ckpt-path /path/to/your/LRC_CKPT.safetensors \
  --target-hidden-size 1536 \
  --raw-model-name /path/to/your/TEACHER_MODEL \
  --save-path /path/to/save/your/STUDENT_MODEL \
  --use-all-attn 1 \
  --use-in-out-mlp 1 \
  --tie-word-emb-proj 1


lm_eval --model hf \
     --model_args pretrained=/path/to/save/your/STUDENT_MODEL,trust_remote_code=true\
     --tasks humaneval,mbpp,hellaswag,winogrande,arc_challenge,boolq,piqa,truthfulqa,mmlu,gsm8k,arc_easy \
     --device cuda:0 \
     --batch_size auto:1 \
     --confirm_run_unsafe_code\
     --trust_remote_code\
   --output_path ./eval_out/