[2025-04-17 09:30:49,465] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
INFO:root:Using nproc_per_node=8.
W0417 09:30:50.697000 140131680359040 torch/distributed/run.py:757] 
W0417 09:30:50.697000 140131680359040 torch/distributed/run.py:757] *****************************************
W0417 09:30:50.697000 140131680359040 torch/distributed/run.py:757] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
W0417 09:30:50.697000 140131680359040 torch/distributed/run.py:757] *****************************************
[2025-04-17 09:30:53,174] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-04-17 09:30:53,216] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-04-17 09:30:53,246] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-04-17 09:30:53,252] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-04-17 09:30:53,257] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-04-17 09:30:53,262] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-04-17 09:30:53,264] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-04-17 09:30:53,265] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-dev package with apt
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[93m [WARNING] [0m async_io: please install the libaio-dev package with apt
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-dev package with apt
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[93m [WARNING] [0m async_io: please install the libaio-dev package with apt
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[93m [WARNING] [0m async_io: please install the libaio-dev package with apt
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[93m [WARNING] [0m async_io: please install the libaio-dev package with apt
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[93m [WARNING] [0m async_io: please install the libaio-dev package with apt
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[93m [WARNING] [0m async_io: please install the libaio-dev package with apt
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[93m [WARNING] [0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[93m [WARNING] [0m using untested triton version (2.3.1), only 1.0.0 is known to be compatible
[93m [WARNING] [0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[93m [WARNING] [0m using untested triton version (2.3.1), only 1.0.0 is known to be compatible
[93m [WARNING] [0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[93m [WARNING] [0m using untested triton version (2.3.1), only 1.0.0 is known to be compatible
[93m [WARNING] [0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[93m [WARNING] [0m using untested triton version (2.3.1), only 1.0.0 is known to be compatible
[93m [WARNING] [0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[93m [WARNING] [0m using untested triton version (2.3.1), only 1.0.0 is known to be compatible
[93m [WARNING] [0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[93m [WARNING] [0m using untested triton version (2.3.1), only 1.0.0 is known to be compatible
[93m [WARNING] [0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[93m [WARNING] [0m using untested triton version (2.3.1), only 1.0.0 is known to be compatible
[2025-04-17 09:30:54,303] [INFO] [comm.py:637:init_distributed] cdb=None
[93m [WARNING] [0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[93m [WARNING] [0m using untested triton version (2.3.1), only 1.0.0 is known to be compatible
[2025-04-17 09:30:54,323] [INFO] [comm.py:637:init_distributed] cdb=None
[2025-04-17 09:30:54,323] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2025-04-17 09:30:54,377] [INFO] [comm.py:637:init_distributed] cdb=None
[2025-04-17 09:30:54,377] [INFO] [comm.py:637:init_distributed] cdb=None
[2025-04-17 09:30:54,377] [INFO] [comm.py:637:init_distributed] cdb=None
[2025-04-17 09:30:54,402] [INFO] [comm.py:637:init_distributed] cdb=None
/home/username/.conda/envs/llm/lib/python3.11/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
2025-04-17 09:30:54 - INFO - __main__ - Model parameters ModelArguments(base_model_revision=None, model_name_or_path='/data/username/grafting/saves/llama3-8b/full/sft_math', model_revision='main', model_code_revision=None, torch_dtype=None, tokenizer_name_or_path=None, trust_remote_code=False, attn_implementation=None, use_peft=False, lora_r=16, lora_alpha=32, lora_dropout=0.05, lora_target_modules=None, lora_modules_to_save=None, load_in_8bit=False, load_in_4bit=False, bnb_4bit_quant_type='nf4', use_bnb_nested_quant=False, bnb_4bit_quant_storage='uint8')
2025-04-17 09:30:54 - INFO - __main__ - Data parameters DataArguments(chat_template=None, dataset_mixer={'/data/public/grafting/dpo_math': 1.0}, text_column='text', dataset_splits=['train'], dataset_configs=None, preprocessing_num_workers=12, truncation_side=None, auto_insert_empty_system_msg=True)
2025-04-17 09:30:54 - INFO - __main__ - Training/evaluation parameters DPOConfig(
_n_gpu=1,
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
batch_eval_metrics=False,
beta=10,
bf16=True,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
dataloader_prefetch_factor=None,
dataset_num_proc=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_dropout=True,
disable_tqdm=False,
dispatch_batches=None,
do_eval=False,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_do_concat_batches=True,
eval_on_start=False,
eval_steps=10000,
eval_strategy=IntervalStrategy.NO,
eval_use_gather_object=False,
evaluation_strategy=None,
f_alpha_divergence_coef=1.0,
f_divergence_type=FDivergenceType.REVERSE_KL,
force_use_ref_model=False,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generate_during_eval=False,
gradient_accumulation_steps=16,
gradient_checkpointing=True,
gradient_checkpointing_kwargs={'use_reentrant': False},
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_num_input_tokens_seen=False,
include_tokens_per_second=False,
is_encoder_decoder=None,
jit_mode_eval=False,
label_names=None,
label_pad_token_id=-100,
label_smoothing=0,
label_smoothing_factor=0.0,
learning_rate=8e-07,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=info,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=/data/username/grafting/saves/llama3-8b/full/dpo_math2/runs/Apr17_09-30-54_amax,
logging_first_step=True,
logging_nan_inf_filter=True,
logging_steps=2,
logging_strategy=IntervalStrategy.STEPS,
loss_type=length_normalization,
lr_scheduler_kwargs={},
lr_scheduler_type=SchedulerType.COSINE,
max_grad_norm=1.0,
max_length=2048,
max_prompt_length=512,
max_steps=-1,
max_target_length=None,
metric_for_best_model=None,
model_adapter_name=None,
model_init_kwargs=None,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_train_epochs=1,
optim=OptimizerNames.ADAMW_TORCH,
optim_args=None,
optim_target_modules=None,
output_dir=/data/username/grafting/saves/llama3-8b/full/dpo_math2,
overwrite_output_dir=False,
padding_value=None,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=1,
precompute_ref_log_probs=False,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
ref_adapter_name=None,
ref_model_init_kwargs=None,
ref_model_mixup_alpha=0.9,
ref_model_sync_steps=64,
reference_free=False,
remove_unused_columns=False,
report_to=['tensorboard'],
restore_callback_states_from_checkpoint=False,
resume_from_checkpoint=None,
rpo_alpha=None,
run_name=/data/username/grafting/saves/llama3-8b/full/dpo_math2,
save_on_each_node=False,
save_only_model=False,
save_safetensors=True,
save_steps=1000,
save_strategy=IntervalStrategy.STEPS,
save_total_limit=2,
seed=42,
skip_memory_metrics=True,
split_batches=None,
sync_ref_model=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torch_empty_cache_steps=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
truncation_mode=keep_end,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.1,
warmup_steps=0,
weight_decay=0.0,
)
44444 Dataset({
    features: ['rejected', 'dataset', 'source_model', 'prompt', 'chosen'],
    num_rows: 11615
})
/home/username/.conda/envs/llm/lib/python3.11/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
2025-04-17 09:30:54 - INFO - __main__ - Training on the following splits: ['train : 11615']
[INFO|tokenization_utils_base.py:2287] 2025-04-17 09:30:54,439 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2287] 2025-04-17 09:30:54,439 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2287] 2025-04-17 09:30:54,439 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2287] 2025-04-17 09:30:54,439 >> loading file tokenizer_config.json
44444 Dataset({
    features: ['rejected', 'dataset', 'source_model', 'prompt', 'chosen'],
    num_rows: 11615
})
[2025-04-17 09:30:54,459] [INFO] [comm.py:637:init_distributed] cdb=None
/home/username/.conda/envs/llm/lib/python3.11/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2025-04-17 09:30:54,474] [INFO] [comm.py:637:init_distributed] cdb=None
/home/username/.conda/envs/llm/lib/python3.11/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
44444 Dataset({
    features: ['rejected', 'dataset', 'source_model', 'prompt', 'chosen'],
    num_rows: 11615
})
44444 Dataset({
    features: ['rejected', 'dataset', 'source_model', 'prompt', 'chosen'],
    num_rows: 11615
})
44444 Dataset({
    features: ['rejected', 'dataset', 'source_model', 'prompt', 'chosen'],
    num_rows: 11615
})
/home/username/.conda/envs/llm/lib/python3.11/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
44444 Dataset({
    features: ['rejected', 'dataset', 'source_model', 'prompt', 'chosen'],
    num_rows: 11615
})
/home/username/.conda/envs/llm/lib/python3.11/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
44444 Dataset({
    features: ['rejected', 'dataset', 'source_model', 'prompt', 'chosen'],
    num_rows: 11615
})
/home/username/.conda/envs/llm/lib/python3.11/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
44444 Dataset({
    features: ['rejected', 'dataset', 'source_model', 'prompt', 'chosen'],
    num_rows: 11615
})
[INFO|tokenization_utils_base.py:2533] 2025-04-17 09:30:54,706 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Formatting comparisons with prompt template (num_proc=12):   0%|          | 0/11615 [00:00<?, ? examples/s]
Formatting comparisons with prompt template (num_proc=12):   0%|          | 0/11615 [00:00<?, ? examples/s]
Formatting comparisons with prompt template (num_proc=12):   0%|          | 0/11615 [00:00<?, ? examples/s]
Formatting comparisons with prompt template (num_proc=12):   0%|          | 0/11615 [00:00<?, ? examples/s]
Formatting comparisons with prompt template (num_proc=12):   0%|          | 0/11615 [00:00<?, ? examples/s]
Formatting comparisons with prompt template (num_proc=12):   0%|          | 0/11615 [00:00<?, ? examples/s]
Formatting comparisons with prompt template (num_proc=12):   0%|          | 0/11615 [00:00<?, ? examples/s]
Formatting comparisons with prompt template (num_proc=12):   0%|          | 0/11615 [00:00<?, ? examples/s]
Formatting comparisons with prompt template (num_proc=12):   1%|          | 107/11615 [00:00<01:08, 167.97 examples/s]
Formatting comparisons with prompt template (num_proc=12):   1%|          | 109/11615 [00:00<01:08, 167.11 examples/s]
Formatting comparisons with prompt template (num_proc=12):   1%|          | 105/11615 [00:00<01:15, 151.67 examples/s]
Formatting comparisons with prompt template (num_proc=12):   1%|          | 109/11615 [00:00<01:12, 158.08 examples/s]
Formatting comparisons with prompt template (num_proc=12):   1%|          | 111/11615 [00:00<01:11, 160.15 examples/s]
Formatting comparisons with prompt template (num_proc=12):   8%|▊         | 902/11615 [00:00<00:07, 1406.08 examples/s]
Formatting comparisons with prompt template (num_proc=12):   1%|          | 101/11615 [00:00<01:16, 149.63 examples/s]
Formatting comparisons with prompt template (num_proc=12):   7%|▋         | 773/11615 [00:00<00:09, 1170.01 examples/s]
Formatting comparisons with prompt template (num_proc=12):   1%|          | 109/11615 [00:00<01:13, 157.47 examples/s]
Formatting comparisons with prompt template (num_proc=12):   1%|          | 109/11615 [00:00<01:15, 151.94 examples/s]
Formatting comparisons with prompt template (num_proc=12):   9%|▊         | 1007/11615 [00:00<00:07, 1433.95 examples/s]
Formatting comparisons with prompt template (num_proc=12):   8%|▊         | 909/11615 [00:00<00:07, 1390.47 examples/s]
Formatting comparisons with prompt template (num_proc=12):   7%|▋         | 860/11615 [00:00<00:08, 1253.23 examples/s]
Formatting comparisons with prompt template (num_proc=12):  16%|█▌        | 1877/11615 [00:00<00:03, 2616.94 examples/s]
Formatting comparisons with prompt template (num_proc=12):   9%|▉         | 1021/11615 [00:00<00:07, 1489.94 examples/s]
Formatting comparisons with prompt template (num_proc=12):  17%|█▋        | 1936/11615 [00:01<00:03, 2711.26 examples/s]
Formatting comparisons with prompt template (num_proc=12):   8%|▊         | 968/11615 [00:00<00:07, 1369.81 examples/s]
Formatting comparisons with prompt template (num_proc=12):   8%|▊         | 877/11615 [00:00<00:09, 1146.81 examples/s]
Formatting comparisons with prompt template (num_proc=12):  25%|██▌       | 2904/11615 [00:01<00:02, 3588.46 examples/s]
Formatting comparisons with prompt template (num_proc=12):  17%|█▋        | 1982/11615 [00:01<00:03, 2577.85 examples/s]
Formatting comparisons with prompt template (num_proc=12):  17%|█▋        | 1936/11615 [00:01<00:03, 2528.21 examples/s]
Formatting comparisons with prompt template (num_proc=12):  17%|█▋        | 2029/11615 [00:01<00:03, 2398.73 examples/s]
Formatting comparisons with prompt template (num_proc=12):  17%|█▋        | 1936/11615 [00:01<00:04, 2389.50 examples/s]
Formatting comparisons with prompt template (num_proc=12):  22%|██▏       | 2498/11615 [00:01<00:03, 2810.46 examples/s]
Formatting comparisons with prompt template (num_proc=12):  17%|█▋        | 1942/11615 [00:01<00:03, 2488.77 examples/s]
Formatting comparisons with prompt template (num_proc=12):  33%|███▎      | 3775/11615 [00:01<00:01, 3995.22 examples/s]
Formatting comparisons with prompt template (num_proc=12):  20%|█▉        | 2274/11615 [00:01<00:03, 2796.50 examples/s]
Formatting comparisons with prompt template (num_proc=12):  25%|██▌       | 2943/11615 [00:01<00:02, 3409.92 examples/s]
Formatting comparisons with prompt template (num_proc=12):  28%|██▊       | 3265/11615 [00:01<00:02, 3802.07 examples/s]
Formatting comparisons with prompt template (num_proc=12):  33%|███▎      | 3872/11615 [00:01<00:01, 4364.52 examples/s]
Formatting comparisons with prompt template (num_proc=12):  23%|██▎       | 2646/11615 [00:01<00:03, 2756.96 examples/s]
Formatting comparisons with prompt template (num_proc=12):  30%|███       | 3495/11615 [00:01<00:02, 3616.78 examples/s]
Formatting comparisons with prompt template (num_proc=12):  25%|██▌       | 2913/11615 [00:01<00:02, 3302.56 examples/s]
Formatting comparisons with prompt template (num_proc=12):  32%|███▏      | 3756/11615 [00:01<00:02, 3869.92 examples/s]
Formatting comparisons with prompt template (num_proc=12):  29%|██▉       | 3404/11615 [00:01<00:02, 3757.03 examples/s]
Formatting comparisons with prompt template (num_proc=12):  39%|███▉      | 4559/11615 [00:01<00:01, 4043.98 examples/s]
Formatting comparisons with prompt template (num_proc=12):  33%|███▎      | 3872/11615 [00:01<00:02, 3703.90 examples/s]
Formatting comparisons with prompt template (num_proc=12):  34%|███▍      | 3983/11615 [00:01<00:02, 3771.88 examples/s]
Formatting comparisons with prompt template (num_proc=12):  41%|████      | 4729/11615 [00:01<00:01, 4063.76 examples/s]
Formatting comparisons with prompt template (num_proc=12):  42%|████▏     | 4889/11615 [00:01<00:01, 4676.78 examples/s]
Formatting comparisons with prompt template (num_proc=12):  41%|████      | 4706/11615 [00:01<00:01, 4133.04 examples/s]
Formatting comparisons with prompt template (num_proc=12):  31%|███       | 3568/11615 [00:01<00:02, 3363.61 examples/s]
Formatting comparisons with prompt template (num_proc=12):  34%|███▍      | 3979/11615 [00:01<00:02, 3623.97 examples/s]
Formatting comparisons with prompt template (num_proc=12):  46%|████▌     | 5329/11615 [00:01<00:01, 4171.70 examples/s]
Formatting comparisons with prompt template (num_proc=12):  38%|███▊      | 4365/11615 [00:01<00:02, 3458.59 examples/s]
Formatting comparisons with prompt template (num_proc=12):  50%|█████     | 5862/11615 [00:01<00:01, 4937.03 examples/s]
Formatting comparisons with prompt template (num_proc=12):  40%|███▉      | 4620/11615 [00:01<00:01, 4064.85 examples/s]
Formatting comparisons with prompt template (num_proc=12):  42%|████▏     | 4840/11615 [00:01<00:01, 4016.53 examples/s]
Formatting comparisons with prompt template (num_proc=12):  43%|████▎     | 4947/11615 [00:01<00:01, 3902.45 examples/s]
Formatting comparisons with prompt template (num_proc=12):  58%|█████▊    | 6780/11615 [00:01<00:00, 5469.73 examples/s]
Formatting comparisons with prompt template (num_proc=12):  52%|█████▏    | 6059/11615 [00:01<00:01, 4601.94 examples/s]
Formatting comparisons with prompt template (num_proc=12):  49%|████▉     | 5741/11615 [00:01<00:01, 4888.92 examples/s]
Formatting comparisons with prompt template (num_proc=12):  46%|████▌     | 5320/11615 [00:01<00:01, 3651.45 examples/s]
Formatting comparisons with prompt template (num_proc=12):  59%|█████▊    | 6818/11615 [00:01<00:00, 5070.08 examples/s]
Formatting comparisons with prompt template (num_proc=12):  50%|████▉     | 5782/11615 [00:01<00:01, 4789.28 examples/s]
Formatting comparisons with prompt template (num_proc=12):  47%|████▋     | 5470/11615 [00:01<00:01, 3890.90 examples/s]
Formatting comparisons with prompt template (num_proc=12):  66%|██████▌   | 7660/11615 [00:02<00:00, 5229.81 examples/s]
Formatting comparisons with prompt template (num_proc=12):  52%|█████▏    | 6046/11615 [00:01<00:01, 4169.85 examples/s]
Formatting comparisons with prompt template (num_proc=12):  55%|█████▍    | 6347/11615 [00:01<00:01, 4269.81 examples/s]
Formatting comparisons with prompt template (num_proc=12):  61%|██████▏   | 7121/11615 [00:02<00:00, 4543.80 examples/s]
Formatting comparisons with prompt template (num_proc=12):  58%|█████▊    | 6694/11615 [00:02<00:01, 4349.49 examples/s]
Formatting comparisons with prompt template (num_proc=12):  74%|███████▎  | 8552/11615 [00:02<00:00, 5794.42 examples/s]
Formatting comparisons with prompt template (num_proc=12):  55%|█████▍    | 6360/11615 [00:01<00:01, 4507.59 examples/s]
Formatting comparisons with prompt template (num_proc=12):  64%|██████▎   | 7401/11615 [00:02<00:00, 4565.06 examples/s]
Formatting comparisons with prompt template (num_proc=12):  56%|█████▌    | 6458/11615 [00:02<00:01, 4407.63 examples/s]
Formatting comparisons with prompt template (num_proc=12):  67%|██████▋   | 7744/11615 [00:02<00:00, 5357.95 examples/s]
Formatting comparisons with prompt template (num_proc=12):  80%|████████  | 9346/11615 [00:02<00:00, 6064.28 examples/s]
Formatting comparisons with prompt template (num_proc=12):  73%|███████▎  | 8478/11615 [00:02<00:00, 5707.61 examples/s]
Formatting comparisons with prompt template (num_proc=12):  66%|██████▌   | 7641/11615 [00:02<00:00, 5046.06 examples/s]
Formatting comparisons with prompt template (num_proc=12):  71%|███████   | 8236/11615 [00:02<00:00, 4632.78 examples/s]
Formatting comparisons with prompt template (num_proc=12):  66%|██████▌   | 7673/11615 [00:02<00:00, 4378.79 examples/s]
Formatting comparisons with prompt template (num_proc=12):  88%|████████▊ | 10223/11615 [00:02<00:00, 6597.73 examples/s]
Formatting comparisons with prompt template (num_proc=12):  79%|███████▊  | 9146/11615 [00:02<00:00, 6854.39 examples/s]
Formatting comparisons with prompt template (num_proc=12):  67%|██████▋   | 7789/11615 [00:02<00:00, 5203.36 examples/s]
Formatting comparisons with prompt template (num_proc=12):  79%|███████▉  | 9187/11615 [00:02<00:00, 5964.35 examples/s]
Formatting comparisons with prompt template (num_proc=12):  67%|██████▋   | 7795/11615 [00:02<00:00, 5222.15 examples/s]
Formatting comparisons with prompt template (num_proc=12):  76%|███████▌  | 8856/11615 [00:02<00:00, 6178.92 examples/s]
Formatting comparisons with prompt template (num_proc=12):  71%|███████   | 8202/11615 [00:02<00:00, 4349.21 examples/s]
Formatting comparisons with prompt template (num_proc=12):  95%|█████████▌| 11067/11615 [00:02<00:00, 6323.81 examples/s]
Formatting comparisons with prompt template (num_proc=12):  83%|████████▎ | 9598/11615 [00:02<00:00, 5324.43 examples/s]
Formatting comparisons with prompt template (num_proc=12):  80%|████████  | 9331/11615 [00:02<00:00, 6726.42 examples/s]
Formatting comparisons with prompt template (num_proc=12):  85%|████████▌ | 9880/11615 [00:02<00:00, 5698.29 examples/s]
Formatting comparisons with prompt template (num_proc=12):  71%|███████   | 8239/11615 [00:02<00:00, 3918.16 examples/s]
Formatting comparisons with prompt template (num_proc=12):  87%|████████▋ | 10051/11615 [00:02<00:00, 5907.01 examples/s]
Formatting comparisons with prompt template (num_proc=12):  84%|████████▍ | 9772/11615 [00:02<00:00, 6393.53 examples/s]
Formatting comparisons with prompt template (num_proc=12):  89%|████████▉ | 10324/11615 [00:02<00:00, 5493.80 examples/s]
Formatting comparisons with prompt template (num_proc=12):  93%|█████████▎| 10751/11615 [00:02<00:00, 6200.60 examples/s]
Formatting comparisons with prompt template (num_proc=12):  83%|████████▎ | 9625/11615 [00:02<00:00, 5639.59 examples/s]
Formatting comparisons with prompt template (num_proc=12):  80%|████████  | 9341/11615 [00:02<00:00, 4938.66 examples/s]
Formatting comparisons with prompt template (num_proc=12):  89%|████████▉ | 10311/11615 [00:02<00:00, 6585.22 examples/s]
Formatting comparisons with prompt template (num_proc=12):  95%|█████████▍| 11004/11615 [00:02<00:00, 6381.47 examples/s]
Formatting comparisons with prompt template (num_proc=12):  91%|█████████▏| 10603/11615 [00:02<00:00, 6812.13 examples/s]
Formatting comparisons with prompt template (num_proc=12): 100%|██████████| 11615/11615 [00:02<00:00, 4253.81 examples/s]
Formatting comparisons with prompt template (num_proc=12): 100%|█████████▉| 11574/11615 [00:02<00:00, 6699.15 examples/s]2025-04-17 09:30:57 - INFO - __main__ - Decontaminated 0 (0.00%) samples from the training set.
2025-04-17 09:30:57 - INFO - __main__ - Prompt sample 10476 of the raw training set:

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

A large rectangle is divided into five identical smaller squares. If the perimeter of the large rectangle is 180 cm, what is its area in square centimeters?<|eot_id|>
2025-04-17 09:30:57 - INFO - __main__ - Chosen sample 10476 of the raw training set:

<|begin_of_text|><|start_header_id|>assistant<|end_header_id|>

Here's how to solve the problem step-by-step:

**1. Understand the Setup**

Imagine a rectangle divided into five equal squares arranged in a row.  

**2. Define Variables**

* Let 's' represent the side length of each small square.
* The rectangle's length will be 5s (since it's made of five squares).
* The rectangle's width will be 's'.

**3. Use the Perimeter Information**

* The perimeter of a rectangle is calculated as: P = 2 * (length + width)
* We know P = 180 cm, so: 180 = 2 * (5s + s)

**4. Solve for 's'**

* Simplify the equation: 180 = 2 * (6s)
* Further simplification: 180 = 12s
* Divide both sides by 12: s = 15 cm

**5. Calculate the Rectangle's Dimensions**

* Length = 5s = 5 * 15 cm = 75 cm
* Width = s = 15 cm

**6. Calculate the Area**

* Area of a rectangle = length * width
* Area = 75 cm * 15 cm = 1125 square centimeters.

**Answer:**  The area of the large rectangle is $\boxed{1125}$ square centimeters.<|eot_id|>
2025-04-17 09:30:57 - INFO - __main__ - Rejected sample 10476 of the raw training set:

<|begin_of_text|><|start_header_id|>assistant<|end_header_id|>

Here's how to solve the problem step-by-step:

**1. Understand the Setup**

Imagine a rectangle divided into a 3x2 grid of smaller squares.  This means the rectangle has 3 squares across its width and 2 squares along its length.

**2. Find the Side Length of a Square**

* Since the rectangle is made up of five identical squares, each square's side length is equal to 1/5 of the rectangle's perimeter divided by 2 (since each square has two sides contributing to the perimeter).

* Perimeter of rectangle / 5 = (180 cm) / 5 = 36 cm
* Side length of a square = 36 cm / 2 = 18 cm

**3. Calculate the Dimensions of the Rectangle**

* Width of rectangle = 3 squares * 18 cm/square = 54 cm
* Length of rectangle = 2 squares * 18 cm/square = 36 cm

**4. Calculate the Area**

* Area of rectangle = width * length = 54 cm * 36 cm = 1944 square cm

**Answer:** The area of the large rectangle is $\boxed{1944}$ square centimeters.<|eot_id|>
2025-04-17 09:30:57 - INFO - __main__ - Prompt sample 1824 of the raw training set:

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Determine the coordinates of the third vertex of an isosceles triangle, given that the vertices are located at (1,2) and (4,0), and the base is 5 units long. The third vertex should be located on the line y = x + 1.<|eot_id|>
2025-04-17 09:30:57 - INFO - __main__ - Chosen sample 1824 of the raw training set:

<|begin_of_text|><|start_header_id|>assistant<|end_header_id|>

Here's how to solve the problem step-by-step:

**1. Find the midpoint of the base.**

The midpoint of the base will be the point equidistant from the two given vertices.  We can find this using the midpoint formula:

Midpoint =  ((x₁ + x₂)/2 , (y₁ + y₂)/2)

Midpoint = ((1 + 4)/2 , (2 + 0)/2) = (2.5, 1)

**2.  Determine the perpendicular bisector of the base.**

The perpendicular bisector of a line segment passes through its midpoint and is perpendicular to the segment.  

* **Slope of the base:** The slope of the line segment connecting (1,2) and (4,0) is (0 - 2)/(4 - 1) = -2/3.

* **Slope of the perpendicular bisector:** The slope of a line perpendicular to the base is the negative reciprocal of the base's slope, which is 3/2.

* **Equation of the perpendicular bisector:** We know the slope (3/2) and a point on the line (the midpoint (2.5, 1)). Using the point-slope form of a linear equation:
   y - y₁ = m(x - x₁)
   y - 1 = (3/2)(x - 2.5)

**3. Find the intersection of the perpendicular bisector and the line y = x + 1.**

This intersection point will be the third vertex of the triangle. To find it, solve the system of equations:

* y - 1 = (3/2)(x - 2.5)
* y = x + 1

Substitute the second equation into the first:

(x + 1) - 1 = (3/2)(x - 2.5)

Simplify and solve for x:

x = (3/2)x - 3.75
(-1/2)x = -3.75
x = 7.5

Substitute the value of x back into the equation y = x + 1 to find y:

y = 7.5 + 1 = 8.5

**4. The Coordinates**

Therefore, the third vertex of the isosceles triangle is located at  \boxed{(7.5, 8.5)}.<|eot_id|>
2025-04-17 09:30:57 - INFO - __main__ - Rejected sample 1824 of the raw training set:

<|begin_of_text|><|start_header_id|>assistant<|end_header_id|>

Here's how to solve the problem step-by-step:

**1. Find the midpoint of the base:**

* The midpoint of a line segment is found by averaging the x-coordinates and the y-coordinates of the endpoints.
* Midpoint =  ((1+4)/2, (2+0)/2) = (2.5, 1)

**2. Find the distance from the midpoint to the third vertex:**

* Since the triangle is isosceles, the altitude from the third vertex to the base bisects the base. This means the distance from the midpoint to the third vertex is half the length of the base, which is 5/2 = 2.5 units.

**3. Use the distance formula and the line equation:**

*  Let the third vertex be (x, y). We know y = x + 1.

* The distance formula tells us: √[(x - 2.5)² + (y - 1)²] = 2.5

* Substitute y = x + 1 into the distance formula: √[(x - 2.5)² + (x + 1 - 1)²] = 2.5

* Simplify: √[(x - 2.5)² + x²] = 2.5

**4. Solve for x:**

* Square both sides: (x - 2.5)² + x² = 6.25
* Expand: x² - 5x + 6.25 + x² = 6.25
* Combine like terms: 2x² - 5x = 0
* Factor: x(2x - 5) = 0
* This gives us two possible solutions for x: x = 0 or x = 2.5/2 = 1.25

**5. Find the corresponding y-values:**

* For x = 0, y = 0 + 1 = 1
* For x = 1.25, y = 1.25 + 1 = 2.25

**6. Determine the correct solution:**

* We need to choose the solution that is consistent with the isosceles triangle property.  The distance between (1, 2) and (1.25, 2.25) is not equal to the distance between (4, 0) and (1.25, 2.25). Therefore, the correct solution is (0, 1).



**Therefore, the coordinates of the third vertex are \boxed{(0, 1)}}.**<|eot_id|>
2025-04-17 09:30:57 - INFO - __main__ - Prompt sample 409 of the raw training set:

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Isosceles triangle $ABC$ has base $BC$ of length 20 and legs of length $x$.  Point $D$ is on $\overline{BC}$ and point $E$ is on $\overline{AC}$ such that $\overline{DE}$ is parallel to $\overline{AB}$.  If $CD=DE=8$, what is the length of $\overline{BE}$?<|eot_id|>
2025-04-17 09:30:57 - INFO - __main__ - Chosen sample 409 of the raw training set:

<|begin_of_text|><|start_header_id|>assistant<|end_header_id|>

First, let's understand the given information:

- Triangle $ABC$ is isosceles with $AB = AC = x$ and base $BC = 20$.
- Point $D$ is on $BC$ and point $E$ is on $AC$ such that $DE$ is parallel to $AB$.
- $CD = DE = 8$.

We need to find the length of $BE$.

Now, let's reason step by step:

1. Since $DE$ is parallel to $AB$, $\angle CDE = \angle CAB$ and $\angle CED = \angle CBA$. This means that triangles $CDE$ and $CAB$ are similar by AA (Angle-Angle) similarity.

2. Because triangles $CDE$ and $CAB$ are similar, the ratios of their corresponding sides are equal. Therefore, we have:

   $$\frac{CD}{CA} = \frac{DE}{AB} = \frac{CE}{CB}$$

3. Substituting the given values into the first equality, we get:

   $$\frac{8}{x} = \frac{8}{20}$$

   Solving for $x$, we find:

   $$x = 20$$

4. Now, let's use the second equality to find $CE$:

   $$\frac{8}{20} = \frac{CE}{20}$$

   Solving for $CE$, we find:

   $$CE = 8$$

5. Finally, to find $BE$, we subtract $CE$ from $AC$:

   $$BE = AC - CE = 20 - 8 = 12$$

So, the length of $BE$ is $\boxed{12}$.<|eot_id|>
2025-04-17 09:30:57 - INFO - __main__ - Rejected sample 409 of the raw training set:

<|begin_of_text|><|start_header_id|>assistant<|end_header_id|>

Step 1: Understand the problem. We have an isosceles triangle $ABC$ with base $BC$ of length 20 and legs of length $x$. Point $D$ is on $\overline{BC}$ and point $E$ is on $\overline{AC}$ such that $\overline{DE}$ is parallel to $\overline{AB}$. The lengths $CD$ and $DE$ are both 8. We need to find the length of $\overline{BE}$.

Step 2: Recognize that $\triangle ABC$ and $\triangle DEC$ are similar triangles. This is because $\overline{DE}$ is parallel to $\overline{AB}$, which means that $\angle C$ is common to both triangles and $\angle DEC$ and $\angle ABC$ are corresponding angles.

Step 3: Write down the proportion of the corresponding sides of the similar triangles. This gives us $\frac{AB}{DE}=\frac{BC}{DC}$.

Step 4: Substitute the given values into the proportion. This gives us $\frac{x}{8}=\frac{20}{8}$.

Step 5: Solve for $x$. This gives us $x=\frac{20\cdot 8}{8}=20$. So, the length of the legs of the triangle is 20.

Step 6: Recognize that $\triangle EBC$ is also similar to $\triangle ABC$ and $\triangle DEC$. This is because $\angle EBC$ is congruent to $\angle ABC$ (corresponding angles) and $\angle ECB$ is congruent to $\angle BCA$ (they are the same angle).

Step 7: Write down the proportion of the corresponding sides of the similar triangles $\triangle EBC$ and $\triangle DEC$. This gives us $\frac{BE}{DE}=\frac{BC}{DC}$.

Step 8: Substitute the given values into the proportion. This gives us $\frac{BE}{8}=\frac{20}{8}$.

Step 9: Solve for $BE$. This gives us $BE=\frac{8\cdot 20}{8}=\boxed{20}$. So, the length of $\overline{BE}$ is 20.<|eot_id|>
DEBUG: model type = <class 'str'>
/home/username/.conda/envs/llm/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': model_init_kwargs, ref_model_init_kwargs, beta, max_length, max_prompt_length, loss_type. Will not be supported from version '1.0.0'.

Deprecated positional argument(s) used in DPOTrainer, please use the DPOConfig to set these arguments instead.
  warnings.warn(message, FutureWarning)
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:171: UserWarning: You passed `model_init_kwargs` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:199: UserWarning: You passed `ref_model_init_kwargs` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:226: UserWarning: You passed a model_id to the DPOTrainer. This will automatically create an `AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you.
  warnings.warn(
[INFO|configuration_utils.py:731] 2025-04-17 09:30:57,703 >> loading configuration file /data/username/grafting/saves/llama3-8b/full/sft_math/config.json
[INFO|configuration_utils.py:800] 2025-04-17 09:30:57,704 >> Model config LlamaConfig {
  "_name_or_path": "/data/username/grafting/saves/llama3-8b/full/sft_math",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": [
    128001,
    128008,
    128009
  ],
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 131072,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": {
    "factor": 8.0,
    "high_freq_factor": 4.0,
    "low_freq_factor": 1.0,
    "original_max_position_embeddings": 8192,
    "rope_type": "llama3"
  },
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "transformers_version": "4.43.4",
  "use_cache": false,
  "vocab_size": 128256
}

[INFO|modeling_utils.py:3641] 2025-04-17 09:30:57,708 >> loading weights file /data/username/grafting/saves/llama3-8b/full/sft_math/model.safetensors.index.json
[INFO|modeling_utils.py:3786] 2025-04-17 09:30:57,708 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model
[INFO|configuration_utils.py:1038] 2025-04-17 09:30:57,716 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": [
    128001,
    128008,
    128009
  ],
  "use_cache": false
}

Formatting comparisons with prompt template (num_proc=12):  89%|████████▊ | 10282/11615 [00:02<00:00, 5318.96 examples/s]
Formatting comparisons with prompt template (num_proc=12):  95%|█████████▍| 11022/11615 [00:02<00:00, 5911.54 examples/s]
Formatting comparisons with prompt template (num_proc=12):  96%|█████████▌| 11109/11615 [00:02<00:00, 4951.79 examples/s]
Formatting comparisons with prompt template (num_proc=12):  90%|█████████ | 10478/11615 [00:02<00:00, 4874.35 examples/s]
Formatting comparisons with prompt template (num_proc=12): 100%|█████████▉| 11583/11615 [00:02<00:00, 6047.18 examples/s]
Formatting comparisons with prompt template (num_proc=12): 100%|██████████| 11615/11615 [00:02<00:00, 4122.87 examples/s]
Formatting comparisons with prompt template (num_proc=12):  95%|█████████▍| 11021/11615 [00:02<00:00, 5291.76 examples/s]
Formatting comparisons with prompt template (num_proc=12): 100%|██████████| 11615/11615 [00:02<00:00, 4071.09 examples/s]
Formatting comparisons with prompt template (num_proc=12):  97%|█████████▋| 11302/11615 [00:02<00:00, 5277.40 examples/s]DEBUG: model type = <class 'str'>
/home/username/.conda/envs/llm/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': model_init_kwargs, ref_model_init_kwargs, beta, max_length, max_prompt_length, loss_type. Will not be supported from version '1.0.0'.

Deprecated positional argument(s) used in DPOTrainer, please use the DPOConfig to set these arguments instead.
  warnings.warn(message, FutureWarning)
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:171: UserWarning: You passed `model_init_kwargs` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:199: UserWarning: You passed `ref_model_init_kwargs` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:226: UserWarning: You passed a model_id to the DPOTrainer. This will automatically create an `AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you.
  warnings.warn(
DEBUG: model type = <class 'str'>
/home/username/.conda/envs/llm/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': model_init_kwargs, ref_model_init_kwargs, beta, max_length, max_prompt_length, loss_type. Will not be supported from version '1.0.0'.

Deprecated positional argument(s) used in DPOTrainer, please use the DPOConfig to set these arguments instead.
  warnings.warn(message, FutureWarning)
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:171: UserWarning: You passed `model_init_kwargs` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:199: UserWarning: You passed `ref_model_init_kwargs` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:226: UserWarning: You passed a model_id to the DPOTrainer. This will automatically create an `AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you.
  warnings.warn(
Formatting comparisons with prompt template (num_proc=12): 100%|██████████| 11615/11615 [00:02<00:00, 4053.44 examples/s]
Formatting comparisons with prompt template (num_proc=12): 100%|██████████| 11615/11615 [00:03<00:00, 4746.37 examples/s]
Formatting comparisons with prompt template (num_proc=12): 100%|██████████| 11615/11615 [00:03<00:00, 3740.10 examples/s]
DEBUG: model type = <class 'str'>
/home/username/.conda/envs/llm/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': model_init_kwargs, ref_model_init_kwargs, beta, max_length, max_prompt_length, loss_type. Will not be supported from version '1.0.0'.

Deprecated positional argument(s) used in DPOTrainer, please use the DPOConfig to set these arguments instead.
  warnings.warn(message, FutureWarning)
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:171: UserWarning: You passed `model_init_kwargs` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:199: UserWarning: You passed `ref_model_init_kwargs` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:226: UserWarning: You passed a model_id to the DPOTrainer. This will automatically create an `AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you.
  warnings.warn(
Formatting comparisons with prompt template (num_proc=12): 100%|██████████| 11615/11615 [00:03<00:00, 3698.86 examples/s]
DEBUG: model type = <class 'str'>
/home/username/.conda/envs/llm/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': model_init_kwargs, ref_model_init_kwargs, beta, max_length, max_prompt_length, loss_type. Will not be supported from version '1.0.0'.

Deprecated positional argument(s) used in DPOTrainer, please use the DPOConfig to set these arguments instead.
  warnings.warn(message, FutureWarning)
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:171: UserWarning: You passed `model_init_kwargs` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:199: UserWarning: You passed `ref_model_init_kwargs` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:226: UserWarning: You passed a model_id to the DPOTrainer. This will automatically create an `AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you.
  warnings.warn(
Formatting comparisons with prompt template (num_proc=12): 100%|██████████| 11615/11615 [00:03<00:00, 3678.98 examples/s]
Formatting comparisons with prompt template (num_proc=12): 100%|██████████| 11615/11615 [00:03<00:00, 3770.71 examples/s]
DEBUG: model type = <class 'str'>
/home/username/.conda/envs/llm/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': model_init_kwargs, ref_model_init_kwargs, beta, max_length, max_prompt_length, loss_type. Will not be supported from version '1.0.0'.

Deprecated positional argument(s) used in DPOTrainer, please use the DPOConfig to set these arguments instead.
  warnings.warn(message, FutureWarning)
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:171: UserWarning: You passed `model_init_kwargs` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:199: UserWarning: You passed `ref_model_init_kwargs` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:226: UserWarning: You passed a model_id to the DPOTrainer. This will automatically create an `AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you.
  warnings.warn(
DEBUG: model type = <class 'str'>
/home/username/.conda/envs/llm/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': model_init_kwargs, ref_model_init_kwargs, beta, max_length, max_prompt_length, loss_type. Will not be supported from version '1.0.0'.

Deprecated positional argument(s) used in DPOTrainer, please use the DPOConfig to set these arguments instead.
  warnings.warn(message, FutureWarning)
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:171: UserWarning: You passed `model_init_kwargs` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:199: UserWarning: You passed `ref_model_init_kwargs` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:226: UserWarning: You passed a model_id to the DPOTrainer. This will automatically create an `AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you.
  warnings.warn(
DEBUG: model type = <class 'str'>
/home/username/.conda/envs/llm/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': model_init_kwargs, ref_model_init_kwargs, beta, max_length, max_prompt_length, loss_type. Will not be supported from version '1.0.0'.

Deprecated positional argument(s) used in DPOTrainer, please use the DPOConfig to set these arguments instead.
  warnings.warn(message, FutureWarning)
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:171: UserWarning: You passed `model_init_kwargs` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:199: UserWarning: You passed `ref_model_init_kwargs` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:226: UserWarning: You passed a model_id to the DPOTrainer. This will automatically create an `AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you.
  warnings.warn(
[2025-04-17 09:30:59,916] [INFO] [partition_parameters.py:345:__exit__] finished initializing model - num_params = 291, num_elems = 8.03B
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:00<00:00,  4.56it/s]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:00<00:00,  4.48it/s]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:00<00:00,  4.32it/s]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:00<00:00,  4.24it/s]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:00<00:00,  4.19it/s]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:00<00:00,  4.10it/s]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:00<00:00,  4.07it/s]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:01<00:04,  1.58s/it]
Loading checkpoint shards:  50%|█████     | 2/4 [00:01<00:01,  1.02it/s]
Loading checkpoint shards:  50%|█████     | 2/4 [00:01<00:01,  1.02it/s]
Loading checkpoint shards:  50%|█████     | 2/4 [00:01<00:01,  1.02it/s]
Loading checkpoint shards:  50%|█████     | 2/4 [00:01<00:01,  1.02it/s]
Loading checkpoint shards:  50%|█████     | 2/4 [00:01<00:01,  1.01it/s]
Loading checkpoint shards:  50%|█████     | 2/4 [00:01<00:01,  1.00it/s]
Loading checkpoint shards:  50%|█████     | 2/4 [00:01<00:02,  1.01s/it]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:03<00:01,  1.13s/it]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:03<00:01,  1.14s/it]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:03<00:01,  1.14s/it]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:03<00:01,  1.13s/it]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:03<00:01,  1.14s/it]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:03<00:01,  1.14s/it]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:03<00:01,  1.15s/it]
Loading checkpoint shards:  50%|█████     | 2/4 [00:03<00:03,  1.59s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.22it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.18it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.22it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.18it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.22it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.22it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.18it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.18it/s]
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:233: UserWarning: You passed a ref model_id to the DPOTrainer. This will automatically create an `AutoModelForCausalLM`
  warnings.warn(
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.22it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.18it/s]
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:233: UserWarning: You passed a ref model_id to the DPOTrainer. This will automatically create an `AutoModelForCausalLM`
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:233: UserWarning: You passed a ref model_id to the DPOTrainer. This will automatically create an `AutoModelForCausalLM`
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:233: UserWarning: You passed a ref model_id to the DPOTrainer. This will automatically create an `AutoModelForCausalLM`
  warnings.warn(
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.22it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.18it/s]
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:233: UserWarning: You passed a ref model_id to the DPOTrainer. This will automatically create an `AutoModelForCausalLM`
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:233: UserWarning: You passed a ref model_id to the DPOTrainer. This will automatically create an `AutoModelForCausalLM`
  warnings.warn(
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.22it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.17it/s]
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:233: UserWarning: You passed a ref model_id to the DPOTrainer. This will automatically create an `AutoModelForCausalLM`
  warnings.warn(
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:04<00:01,  1.55s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:05<00:00,  1.10s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:05<00:00,  1.27s/it]
[INFO|modeling_utils.py:4473] 2025-04-17 09:31:05,036 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:4481] 2025-04-17 09:31:05,037 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /data/username/grafting/saves/llama3-8b/full/sft_math.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:991] 2025-04-17 09:31:05,039 >> loading configuration file /data/username/grafting/saves/llama3-8b/full/sft_math/generation_config.json
[INFO|configuration_utils.py:1038] 2025-04-17 09:31:05,039 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "do_sample": true,
  "eos_token_id": [
    128001,
    128008,
    128009
  ],
  "temperature": 0.6,
  "top_p": 0.9
}

/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:233: UserWarning: You passed a ref model_id to the DPOTrainer. This will automatically create an `AutoModelForCausalLM`
  warnings.warn(
[INFO|configuration_utils.py:731] 2025-04-17 09:31:05,040 >> loading configuration file /data/username/grafting/saves/llama3-8b/full/sft_math/config.json
[INFO|configuration_utils.py:800] 2025-04-17 09:31:05,040 >> Model config LlamaConfig {
  "_name_or_path": "/data/username/grafting/saves/llama3-8b/full/sft_math",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": [
    128001,
    128008,
    128009
  ],
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 131072,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": {
    "factor": 8.0,
    "high_freq_factor": 4.0,
    "low_freq_factor": 1.0,
    "original_max_position_embeddings": 8192,
    "rope_type": "llama3"
  },
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "transformers_version": "4.43.4",
  "use_cache": false,
  "vocab_size": 128256
}

[INFO|modeling_utils.py:3641] 2025-04-17 09:31:05,041 >> loading weights file /data/username/grafting/saves/llama3-8b/full/sft_math/model.safetensors.index.json
[INFO|modeling_utils.py:3786] 2025-04-17 09:31:05,041 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model
[INFO|configuration_utils.py:1038] 2025-04-17 09:31:05,044 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": [
    128001,
    128008,
    128009
  ],
  "use_cache": false
}

[2025-04-17 09:31:05,962] [INFO] [partition_parameters.py:345:__exit__] finished initializing model - num_params = 582, num_elems = 16.06B
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:00<00:00,  5.12it/s]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:00<00:00,  5.11it/s]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:00<00:00,  5.09it/s]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:00<00:00,  4.97it/s]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:00<00:00,  4.99it/s]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:00<00:00,  4.95it/s]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:00<00:00,  4.89it/s]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:01<00:04,  1.55s/it]
Loading checkpoint shards:  50%|█████     | 2/4 [00:01<00:01,  1.04it/s]
Loading checkpoint shards:  50%|█████     | 2/4 [00:01<00:01,  1.03it/s]
Loading checkpoint shards:  50%|█████     | 2/4 [00:01<00:01,  1.03it/s]
Loading checkpoint shards:  50%|█████     | 2/4 [00:01<00:01,  1.03it/s]
Loading checkpoint shards:  50%|█████     | 2/4 [00:01<00:01,  1.02it/s]
Loading checkpoint shards:  50%|█████     | 2/4 [00:01<00:01,  1.02it/s]
Loading checkpoint shards:  50%|█████     | 2/4 [00:01<00:01,  1.01it/s]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:03<00:01,  1.13s/it]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:03<00:01,  1.13s/it]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:03<00:01,  1.13s/it]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:03<00:01,  1.13s/it]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:03<00:01,  1.13s/it]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:03<00:01,  1.14s/it]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:03<00:01,  1.14s/it]
Loading checkpoint shards:  50%|█████     | 2/4 [00:03<00:03,  1.57s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.24it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.20it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.25it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.20it/s]
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:389: UserWarning: You passed `max_length` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:402: UserWarning: You passed `max_prompt_length` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.24it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.20it/s]
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:487: UserWarning: You passed `loss_type` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:504: UserWarning: You passed `beta` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.24it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.20it/s]
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:389: UserWarning: You passed `max_length` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:402: UserWarning: You passed `max_prompt_length` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.24it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.20it/s]
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:389: UserWarning: You passed `max_length` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:487: UserWarning: You passed `loss_type` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:504: UserWarning: You passed `beta` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:402: UserWarning: You passed `max_prompt_length` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.24it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.20it/s]
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:389: UserWarning: You passed `max_length` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:402: UserWarning: You passed `max_prompt_length` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:487: UserWarning: You passed `loss_type` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:504: UserWarning: You passed `beta` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:487: UserWarning: You passed `loss_type` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:504: UserWarning: You passed `beta` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:389: UserWarning: You passed `max_length` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:402: UserWarning: You passed `max_prompt_length` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:389: UserWarning: You passed `max_length` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:402: UserWarning: You passed `max_prompt_length` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:487: UserWarning: You passed `loss_type` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:504: UserWarning: You passed `beta` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:487: UserWarning: You passed `loss_type` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:504: UserWarning: You passed `beta` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.24it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.19it/s]
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:389: UserWarning: You passed `max_length` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:402: UserWarning: You passed `max_prompt_length` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:487: UserWarning: You passed `loss_type` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:504: UserWarning: You passed `beta` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:04<00:01,  1.54s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:05<00:00,  1.10s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:05<00:00,  1.26s/it]
[INFO|modeling_utils.py:4473] 2025-04-17 09:31:11,040 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:4481] 2025-04-17 09:31:11,040 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /data/username/grafting/saves/llama3-8b/full/sft_math.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:991] 2025-04-17 09:31:11,042 >> loading configuration file /data/username/grafting/saves/llama3-8b/full/sft_math/generation_config.json
[INFO|configuration_utils.py:1038] 2025-04-17 09:31:11,043 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "do_sample": true,
  "eos_token_id": [
    128001,
    128008,
    128009
  ],
  "temperature": 0.6,
  "top_p": 0.9
}

/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:389: UserWarning: You passed `max_length` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:402: UserWarning: You passed `max_prompt_length` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:487: UserWarning: You passed `loss_type` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
/home/username/.conda/envs/llm/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py:504: UserWarning: You passed `beta` to the DPOTrainer, the value you passed will override the one in the `DPOConfig`.
  warnings.warn(
Map:   0%|          | 0/11615 [00:00<?, ? examples/s][WARNING|tokenization_utils_base.py:4119] 2025-04-17 09:31:11,556 >> Token indices sequence length is longer than the specified maximum sequence length for this model (2107 > 2048). Running this sequence through the model will result in indexing errors
Map:   0%|          | 35/11615 [00:00<00:34, 335.12 examples/s]
Map:   1%|          | 70/11615 [00:00<00:33, 341.96 examples/s]
Map:   1%|          | 105/11615 [00:00<00:33, 342.32 examples/s]
Map:   1%|          | 145/11615 [00:00<00:31, 360.74 examples/s]
Map:   2%|▏         | 184/11615 [00:00<00:31, 367.74 examples/s]
Map:   2%|▏         | 222/11615 [00:00<00:30, 369.14 examples/s]
Map:   2%|▏         | 276/11615 [00:00<00:31, 361.58 examples/s]
Map:   3%|▎         | 330/11615 [00:00<00:31, 357.92 examples/s]
Map:   3%|▎         | 372/11615 [00:01<00:30, 371.23 examples/s]
Map:   4%|▎         | 410/11615 [00:01<00:30, 368.39 examples/s]
Map:   4%|▍         | 453/11615 [00:01<00:29, 384.31 examples/s]
Map:   4%|▍         | 507/11615 [00:01<00:29, 372.12 examples/s]
Map:   5%|▍         | 560/11615 [00:01<00:30, 358.75 examples/s]
Map:   5%|▌         | 597/11615 [00:01<00:30, 360.48 examples/s]
Map:   5%|▌         | 638/11615 [00:01<00:29, 369.91 examples/s]
Map:   6%|▌         | 676/11615 [00:01<00:29, 368.35 examples/s]
Map:   6%|▋         | 730/11615 [00:02<00:30, 358.61 examples/s]
Map:   7%|▋         | 783/11615 [00:02<00:30, 354.86 examples/s]
Map:   7%|▋         | 821/11615 [00:02<00:30, 359.52 examples/s]
Map:   7%|▋         | 858/11615 [00:02<00:29, 360.15 examples/s]
Map:   8%|▊         | 912/11615 [00:02<00:29, 358.12 examples/s]
Map:   8%|▊         | 950/11615 [00:02<00:29, 358.93 examples/s]
Map:   9%|▊         | 989/11615 [00:02<00:29, 363.35 examples/s]
Map:   9%|▉         | 1026/11615 [00:02<00:29, 363.96 examples/s]
Map:   9%|▉         | 1079/11615 [00:02<00:29, 353.00 examples/s]
Map:  10%|▉         | 1118/11615 [00:03<00:29, 359.07 examples/s]
Map:  10%|▉         | 1157/11615 [00:03<00:28, 364.65 examples/s]
Map:  10%|█         | 1194/11615 [00:03<00:28, 363.57 examples/s]
Map:  11%|█         | 1233/11615 [00:03<00:28, 368.87 examples/s]
Map:  11%|█         | 1289/11615 [00:03<00:28, 365.84 examples/s]
Map:  12%|█▏        | 1343/11615 [00:03<00:28, 358.50 examples/s]
Map:  12%|█▏        | 1396/11615 [00:03<00:28, 352.52 examples/s]
Map:  12%|█▏        | 1432/11615 [00:03<00:28, 351.74 examples/s]
Map:  13%|█▎        | 1470/11615 [00:04<00:28, 352.56 examples/s]
Map:  13%|█▎        | 1525/11615 [00:04<00:28, 355.97 examples/s]
Map:  13%|█▎        | 1562/11615 [00:04<00:28, 357.30 examples/s]
Map:  14%|█▍        | 1600/11615 [00:04<00:28, 357.08 examples/s]
Map:  14%|█▍        | 1640/11615 [00:04<00:27, 364.34 examples/s]
Map:  14%|█▍        | 1680/11615 [00:04<00:27, 365.33 examples/s]
Map:  15%|█▍        | 1720/11615 [00:04<00:27, 365.96 examples/s]
Map:  15%|█▌        | 1759/11615 [00:04<00:26, 365.55 examples/s]
Map:  16%|█▌        | 1810/11615 [00:05<00:27, 354.45 examples/s]
Map:  16%|█▌        | 1846/11615 [00:05<00:27, 353.58 examples/s]
Map:  16%|█▌        | 1884/11615 [00:05<00:27, 356.79 examples/s]
Map:  17%|█▋        | 1922/11615 [00:05<00:26, 360.89 examples/s]
Map:  17%|█▋        | 1962/11615 [00:05<00:26, 364.92 examples/s]
Map:  17%|█▋        | 2000/11615 [00:05<00:26, 367.48 examples/s]
Map:  18%|█▊        | 2051/11615 [00:05<00:26, 355.16 examples/s]
Map:  18%|█▊        | 2089/11615 [00:05<00:26, 357.47 examples/s]
Map:  18%|█▊        | 2126/11615 [00:05<00:26, 358.54 examples/s]
Map:  19%|█▊        | 2165/11615 [00:05<00:25, 363.46 examples/s]
Map:  19%|█▉        | 2220/11615 [00:06<00:26, 358.41 examples/s]
Map:  19%|█▉        | 2260/11615 [00:06<00:25, 363.00 examples/s]
Map:  20%|█▉        | 2298/11615 [00:06<00:25, 364.93 examples/s]
Map:  20%|██        | 2336/11615 [00:06<00:25, 363.89 examples/s]
Map:  20%|██        | 2374/11615 [00:06<00:25, 368.21 examples/s]
Map:  21%|██        | 2430/11615 [00:06<00:25, 361.58 examples/s]
Map:  21%|██▏       | 2471/11615 [00:06<00:24, 370.26 examples/s]
Map:  22%|██▏       | 2510/11615 [00:06<00:24, 370.81 examples/s]
Map:  22%|██▏       | 2550/11615 [00:07<00:27, 330.81 examples/s]
Map:  22%|██▏       | 2596/11615 [00:07<00:28, 319.15 examples/s]
Map:  23%|██▎       | 2633/11615 [00:07<00:27, 328.67 examples/s]
Map:  23%|██▎       | 2670/11615 [00:07<00:26, 336.36 examples/s]
Map:  23%|██▎       | 2709/11615 [00:07<00:25, 349.66 examples/s]
Map:  24%|██▎       | 2746/11615 [00:07<00:25, 353.63 examples/s]
Map:  24%|██▍       | 2782/11615 [00:07<00:24, 353.83 examples/s]
Map:  24%|██▍       | 2820/11615 [00:07<00:24, 354.18 examples/s]
Map:  25%|██▍       | 2857/11615 [00:07<00:24, 357.31 examples/s]
Map:  25%|██▌       | 2913/11615 [00:08<00:24, 359.38 examples/s]
Map:  25%|██▌       | 2954/11615 [00:08<00:23, 368.80 examples/s]
Map:  26%|██▌       | 3010/11615 [00:08<00:23, 363.54 examples/s]
Map:  26%|██▋       | 3049/11615 [00:08<00:23, 364.77 examples/s]
Map:  27%|██▋       | 3100/11615 [00:08<00:24, 349.82 examples/s]
Map:  27%|██▋       | 3139/11615 [00:08<00:23, 355.16 examples/s]
Map:  27%|██▋       | 3190/11615 [00:08<00:24, 345.70 examples/s]
Map:  28%|██▊       | 3231/11615 [00:09<00:23, 357.15 examples/s]
Map:  28%|██▊       | 3270/11615 [00:09<00:23, 356.68 examples/s]
Map:  28%|██▊       | 3310/11615 [00:09<00:23, 359.17 examples/s]
Map:  29%|██▉       | 3348/11615 [00:09<00:23, 356.33 examples/s]
Map:  29%|██▉       | 3387/11615 [00:09<00:22, 363.33 examples/s]
Map:  29%|██▉       | 3424/11615 [00:09<00:22, 364.20 examples/s]
Map:  30%|██▉       | 3461/11615 [00:09<00:22, 364.75 examples/s]
Map:  30%|███       | 3499/11615 [00:09<00:22, 362.01 examples/s]
Map:  31%|███       | 3554/11615 [00:09<00:22, 361.26 examples/s]
Map:  31%|███       | 3610/11615 [00:10<00:22, 361.52 examples/s]
Map:  31%|███▏      | 3650/11615 [00:10<00:21, 366.36 examples/s]
Map:  32%|███▏      | 3701/11615 [00:10<00:22, 353.08 examples/s]
Map:  32%|███▏      | 3742/11615 [00:10<00:21, 364.29 examples/s]
Map:  33%|███▎      | 3794/11615 [00:10<00:22, 352.06 examples/s]
Map:  33%|███▎      | 3832/11615 [00:10<00:21, 356.39 examples/s]
Map:  33%|███▎      | 3870/11615 [00:10<00:21, 357.42 examples/s]
Map:  34%|███▎      | 3908/11615 [00:10<00:21, 358.00 examples/s]
Map:  34%|███▍      | 3962/11615 [00:11<00:21, 355.20 examples/s]
Map:  34%|███▍      | 4000/11615 [00:11<00:21, 357.02 examples/s]
Map:  35%|███▍      | 4038/11615 [00:11<00:21, 360.81 examples/s]
Map:  35%|███▌      | 4075/11615 [00:11<00:20, 360.73 examples/s]
Map:  35%|███▌      | 4113/11615 [00:11<00:20, 363.17 examples/s]
Map:  36%|███▌      | 4150/11615 [00:11<00:20, 361.60 examples/s]
Map:  36%|███▌      | 4189/11615 [00:11<00:20, 364.69 examples/s]
Map:  37%|███▋      | 4242/11615 [00:11<00:20, 356.01 examples/s]
Map:  37%|███▋      | 4280/11615 [00:11<00:20, 353.11 examples/s]
Map:  37%|███▋      | 4320/11615 [00:12<00:20, 360.14 examples/s]
Map:  38%|███▊      | 4371/11615 [00:12<00:20, 349.35 examples/s]
Map:  38%|███▊      | 4423/11615 [00:12<00:20, 345.38 examples/s]
Map:  38%|███▊      | 4461/11615 [00:12<00:20, 349.56 examples/s]
Map:  39%|███▉      | 4513/11615 [00:12<00:20, 346.26 examples/s]
Map:  39%|███▉      | 4549/11615 [00:12<00:20, 349.20 examples/s]
Map:  40%|███▉      | 4600/11615 [00:12<00:20, 343.05 examples/s]
Map:  40%|███▉      | 4637/11615 [00:12<00:20, 347.67 examples/s]
Map:  40%|████      | 4674/11615 [00:13<00:19, 350.49 examples/s]
Map:  41%|████      | 4712/11615 [00:13<00:19, 354.57 examples/s]
Map:  41%|████      | 4750/11615 [00:13<00:19, 359.04 examples/s]
Map:  41%|████▏     | 4803/11615 [00:13<00:19, 353.33 examples/s]
Map:  42%|████▏     | 4856/11615 [00:13<00:19, 348.38 examples/s]
Map:  42%|████▏     | 4908/11615 [00:13<00:19, 342.81 examples/s]
Map:  43%|████▎     | 4960/11615 [00:13<00:19, 341.80 examples/s]
Map:  43%|████▎     | 4999/11615 [00:13<00:18, 348.50 examples/s]
Map:  43%|████▎     | 5052/11615 [00:14<00:18, 348.77 examples/s]
Map:  44%|████▍     | 5090/11615 [00:14<00:18, 353.14 examples/s]
Map:  44%|████▍     | 5129/11615 [00:14<00:18, 359.63 examples/s]
Map:  45%|████▍     | 5183/11615 [00:14<00:18, 357.12 examples/s]
Map:  45%|████▍     | 5222/11615 [00:14<00:17, 361.74 examples/s]
Map:  45%|████▌     | 5261/11615 [00:14<00:17, 366.73 examples/s]
Map:  46%|████▌     | 5300/11615 [00:14<00:17, 364.08 examples/s]
Map:  46%|████▌     | 5340/11615 [00:14<00:17, 360.76 examples/s]
Map:  46%|████▋     | 5380/11615 [00:15<00:17, 361.82 examples/s]
Map:  47%|████▋     | 5435/11615 [00:15<00:17, 357.47 examples/s]
Map:  47%|████▋     | 5476/11615 [00:15<00:16, 368.13 examples/s]
Map:  48%|████▊     | 5530/11615 [00:15<00:17, 356.77 examples/s]
Map:  48%|████▊     | 5567/11615 [00:15<00:16, 358.36 examples/s]
Map:  48%|████▊     | 5621/11615 [00:15<00:16, 354.33 examples/s]
Map:  49%|████▊     | 5660/11615 [00:15<00:16, 358.46 examples/s]
Map:  49%|████▉     | 5701/11615 [00:15<00:16, 368.76 examples/s]
Map:  50%|████▉     | 5753/11615 [00:16<00:16, 358.98 examples/s]
Map:  50%|████▉     | 5807/11615 [00:16<00:16, 358.21 examples/s]
Map:  50%|█████     | 5846/11615 [00:16<00:15, 362.37 examples/s]
Map:  51%|█████     | 5886/11615 [00:16<00:15, 367.24 examples/s]
Map:  51%|█████     | 5942/11615 [00:16<00:15, 367.65 examples/s]
Map:  52%|█████▏    | 5993/11615 [00:16<00:15, 355.67 examples/s]
Map:  52%|█████▏    | 6029/11615 [00:16<00:15, 352.59 examples/s]
Map:  52%|█████▏    | 6065/11615 [00:16<00:15, 351.44 examples/s]
Map:  53%|█████▎    | 6110/11615 [00:17<00:16, 324.34 examples/s]
Map:  53%|█████▎    | 6144/11615 [00:17<00:16, 326.72 examples/s]
Map:  53%|█████▎    | 6181/11615 [00:17<00:16, 335.86 examples/s]
Map:  54%|█████▎    | 6220/11615 [00:17<00:15, 339.31 examples/s]
Map:  54%|█████▍    | 6260/11615 [00:17<00:15, 347.21 examples/s]
Map:  54%|█████▍    | 6298/11615 [00:17<00:15, 353.95 examples/s]
Map:  55%|█████▍    | 6337/11615 [00:17<00:14, 361.17 examples/s]
Map:  55%|█████▌    | 6392/11615 [00:17<00:14, 358.31 examples/s]
Map:  55%|█████▌    | 6430/11615 [00:18<00:14, 361.74 examples/s]
Map:  56%|█████▌    | 6467/11615 [00:18<00:14, 361.62 examples/s]
Map:  56%|█████▌    | 6520/11615 [00:18<00:14, 355.59 examples/s]
Map:  56%|█████▋    | 6560/11615 [00:18<00:13, 361.59 examples/s]
Map:  57%|█████▋    | 6600/11615 [00:18<00:13, 365.23 examples/s]
Map:  57%|█████▋    | 6637/11615 [00:18<00:13, 365.55 examples/s]
Map:  57%|█████▋    | 6677/11615 [00:18<00:13, 368.12 examples/s]
Map:  58%|█████▊    | 6714/11615 [00:18<00:13, 365.48 examples/s]
Map:  58%|█████▊    | 6770/11615 [00:18<00:13, 363.01 examples/s]
Map:  59%|█████▊    | 6810/11615 [00:19<00:13, 365.97 examples/s]
Map:  59%|█████▉    | 6849/11615 [00:19<00:12, 371.41 examples/s]
Map:  59%|█████▉    | 6888/11615 [00:19<00:12, 373.46 examples/s]
Map:  60%|█████▉    | 6940/11615 [00:19<00:13, 355.52 examples/s]
Map:  60%|██████    | 6994/11615 [00:19<00:13, 354.51 examples/s]
Map:  61%|██████    | 7031/11615 [00:19<00:12, 354.27 examples/s]
Map:  61%|██████    | 7071/11615 [00:19<00:12, 359.71 examples/s]
Map:  61%|██████▏   | 7122/11615 [00:19<00:12, 348.24 examples/s]
Map:  62%|██████▏   | 7160/11615 [00:20<00:12, 351.24 examples/s]
Map:  62%|██████▏   | 7200/11615 [00:20<00:12, 356.90 examples/s]
Map:  62%|██████▏   | 7252/11615 [00:20<00:12, 349.82 examples/s]
Map:  63%|██████▎   | 7289/11615 [00:20<00:12, 352.00 examples/s]
Map:  63%|██████▎   | 7326/11615 [00:20<00:12, 356.01 examples/s]
Map:  63%|██████▎   | 7363/11615 [00:20<00:11, 358.15 examples/s]
Map:  64%|██████▎   | 7402/11615 [00:20<00:11, 362.71 examples/s]
Map:  64%|██████▍   | 7440/11615 [00:20<00:11, 362.99 examples/s]
Map:  64%|██████▍   | 7479/11615 [00:20<00:11, 368.44 examples/s]
Map:  65%|██████▍   | 7528/11615 [00:21<00:11, 351.30 examples/s]
Map:  65%|██████▌   | 7573/11615 [00:21<00:10, 371.67 examples/s]
Map:  66%|██████▌   | 7627/11615 [00:21<00:10, 362.64 examples/s]
Map:  66%|██████▌   | 7664/11615 [00:21<00:10, 362.57 examples/s]
Map:  66%|██████▋   | 7702/11615 [00:21<00:10, 361.98 examples/s]
Map:  67%|██████▋   | 7740/11615 [00:21<00:10, 352.91 examples/s]
Map:  67%|██████▋   | 7794/11615 [00:21<00:10, 351.19 examples/s]
Map:  67%|██████▋   | 7830/11615 [00:21<00:10, 351.83 examples/s]
Map:  68%|██████▊   | 7882/11615 [00:22<00:10, 347.34 examples/s]
Map:  68%|██████▊   | 7918/11615 [00:22<00:10, 347.92 examples/s]
Map:  68%|██████▊   | 7953/11615 [00:22<00:10, 345.79 examples/s]
Map:  69%|██████▉   | 7990/11615 [00:22<00:10, 343.25 examples/s]
Map:  69%|██████▉   | 8030/11615 [00:22<00:10, 353.39 examples/s]
Map:  69%|██████▉   | 8068/11615 [00:22<00:09, 358.68 examples/s]
Map:  70%|██████▉   | 8121/11615 [00:22<00:09, 352.11 examples/s]
Map:  70%|███████   | 8175/11615 [00:22<00:09, 351.82 examples/s]
Map:  71%|███████   | 8230/11615 [00:23<00:09, 351.46 examples/s]
Map:  71%|███████   | 8270/11615 [00:23<00:09, 355.95 examples/s]
Map:  72%|███████▏  | 8308/11615 [00:23<00:09, 356.29 examples/s]
Map:  72%|███████▏  | 8345/11615 [00:23<00:09, 356.08 examples/s]
Map:  72%|███████▏  | 8382/11615 [00:23<00:09, 358.02 examples/s]
Map:  72%|███████▏  | 8420/11615 [00:23<00:08, 359.26 examples/s]
Map:  73%|███████▎  | 8457/11615 [00:23<00:08, 357.83 examples/s]
Map:  73%|███████▎  | 8509/11615 [00:23<00:08, 348.40 examples/s]
Map:  74%|███████▎  | 8557/11615 [00:23<00:09, 335.00 examples/s]
Map:  74%|███████▍  | 8608/11615 [00:24<00:09, 331.48 examples/s]
Map:  75%|███████▍  | 8658/11615 [00:24<00:08, 330.76 examples/s]
Map:  75%|███████▍  | 8692/11615 [00:24<00:08, 331.29 examples/s]
Map:  75%|███████▌  | 8734/11615 [00:24<00:08, 352.79 examples/s]
Map:  76%|███████▌  | 8770/11615 [00:24<00:08, 347.76 examples/s]
Map:  76%|███████▌  | 8809/11615 [00:24<00:07, 354.58 examples/s]
Map:  76%|███████▌  | 8848/11615 [00:24<00:07, 360.87 examples/s]
Map:  77%|███████▋  | 8887/11615 [00:24<00:07, 367.52 examples/s]
Map:  77%|███████▋  | 8939/11615 [00:25<00:07, 354.42 examples/s]
Map:  77%|███████▋  | 8975/11615 [00:25<00:07, 352.31 examples/s]
Map:  78%|███████▊  | 9028/11615 [00:25<00:07, 348.69 examples/s]
Map:  78%|███████▊  | 9081/11615 [00:25<00:07, 346.12 examples/s]
Map:  79%|███████▊  | 9120/11615 [00:25<00:07, 351.79 examples/s]
Map:  79%|███████▉  | 9158/11615 [00:25<00:06, 356.12 examples/s]
Map:  79%|███████▉  | 9194/11615 [00:25<00:06, 353.54 examples/s]
Map:  80%|███████▉  | 9236/11615 [00:25<00:06, 369.99 examples/s]
Map:  80%|███████▉  | 9290/11615 [00:26<00:06, 362.13 examples/s]
Map:  80%|████████  | 9327/11615 [00:26<00:06, 359.42 examples/s]
Map:  81%|████████  | 9381/11615 [00:26<00:06, 354.88 examples/s]
Map:  81%|████████  | 9419/11615 [00:26<00:06, 355.53 examples/s]
Map:  81%|████████▏ | 9456/11615 [00:26<00:06, 354.96 examples/s]
Map:  82%|████████▏ | 9493/11615 [00:26<00:05, 355.67 examples/s]
Map:  82%|████████▏ | 9533/11615 [00:26<00:05, 362.89 examples/s]
Map:  83%|████████▎ | 9584/11615 [00:26<00:05, 348.46 examples/s]
Map:  83%|████████▎ | 9620/11615 [00:27<00:05, 347.87 examples/s]
Map:  83%|████████▎ | 9667/11615 [00:27<00:06, 318.89 examples/s]
Map:  84%|████████▎ | 9707/11615 [00:27<00:05, 334.90 examples/s]
Map:  84%|████████▍ | 9744/11615 [00:27<00:05, 340.06 examples/s]
Map:  84%|████████▍ | 9780/11615 [00:27<00:05, 340.14 examples/s]
Map:  85%|████████▍ | 9818/11615 [00:27<00:05, 349.38 examples/s]
Map:  85%|████████▍ | 9857/11615 [00:27<00:04, 359.26 examples/s]
Map:  85%|████████▌ | 9897/11615 [00:27<00:04, 366.30 examples/s]
Map:  86%|████████▌ | 9950/11615 [00:27<00:04, 358.39 examples/s]
Map:  86%|████████▌ | 9988/11615 [00:28<00:04, 361.75 examples/s]
Map:  86%|████████▋ | 10042/11615 [00:28<00:04, 357.44 examples/s]
Map:  87%|████████▋ | 10080/11615 [00:28<00:04, 360.73 examples/s]
Map:  87%|████████▋ | 10117/11615 [00:28<00:04, 358.78 examples/s]
Map:  88%|████████▊ | 10170/11615 [00:28<00:04, 352.58 examples/s]
Map:  88%|████████▊ | 10224/11615 [00:28<00:03, 351.60 examples/s]
Map:  88%|████████▊ | 10262/11615 [00:28<00:03, 357.03 examples/s]
Map:  89%|████████▊ | 10299/11615 [00:28<00:03, 358.29 examples/s]
Map:  89%|████████▉ | 10339/11615 [00:29<00:03, 366.32 examples/s]
Map:  89%|████████▉ | 10376/11615 [00:29<00:03, 363.17 examples/s]
Map:  90%|████████▉ | 10414/11615 [00:29<00:03, 366.41 examples/s]
Map:  90%|█████████ | 10470/11615 [00:29<00:03, 360.57 examples/s]
Map:  90%|█████████ | 10509/11615 [00:29<00:03, 365.27 examples/s]
Map:  91%|█████████ | 10562/11615 [00:29<00:02, 356.91 examples/s]
Map:  91%|█████████▏| 10602/11615 [00:29<00:02, 361.36 examples/s]
Map:  92%|█████████▏| 10640/11615 [00:29<00:02, 363.28 examples/s]
Map:  92%|█████████▏| 10696/11615 [00:30<00:02, 362.40 examples/s]
Map:  93%|█████████▎| 10748/11615 [00:30<00:02, 353.17 examples/s]
Map:  93%|█████████▎| 10800/11615 [00:30<00:02, 343.88 examples/s]
Map:  93%|█████████▎| 10839/11615 [00:30<00:02, 352.33 examples/s]
Map:  94%|█████████▎| 10875/11615 [00:30<00:02, 350.91 examples/s]
Map:  94%|█████████▍| 10929/11615 [00:30<00:01, 350.39 examples/s]
Map:  95%|█████████▍| 10982/11615 [00:30<00:01, 348.93 examples/s]
Map:  95%|█████████▍| 11019/11615 [00:30<00:01, 351.30 examples/s]
Map:  95%|█████████▌| 11058/11615 [00:31<00:01, 359.57 examples/s]
Map:  96%|█████████▌| 11110/11615 [00:31<00:01, 353.11 examples/s]
Map:  96%|█████████▌| 11162/11615 [00:31<00:01, 348.96 examples/s]
Map:  96%|█████████▋| 11200/11615 [00:31<00:01, 350.68 examples/s]
Map:  97%|█████████▋| 11240/11615 [00:31<00:01, 358.82 examples/s]
Map:  97%|█████████▋| 11293/11615 [00:31<00:00, 353.95 examples/s]
Map:  98%|█████████▊| 11331/11615 [00:31<00:00, 357.88 examples/s]
Map:  98%|█████████▊| 11370/11615 [00:31<00:00, 359.59 examples/s]
Map:  98%|█████████▊| 11408/11615 [00:32<00:00, 363.83 examples/s]
Map:  99%|█████████▊| 11460/11615 [00:32<00:00, 356.48 examples/s]
Map:  99%|█████████▉| 11496/11615 [00:32<00:00, 355.91 examples/s]
Map:  99%|█████████▉| 11533/11615 [00:32<00:00, 357.60 examples/s]
Map: 100%|█████████▉| 11569/11615 [00:32<00:00, 356.22 examples/s]
Map: 100%|█████████▉| 11605/11615 [00:32<00:00, 355.17 examples/s]
Map: 100%|██████████| 11615/11615 [00:36<00:00, 321.37 examples/s]
[INFO|trainer.py:648] 2025-04-17 09:31:47,667 >> Using auto half precision backend
[2025-04-17 09:31:47,668] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.14.4, git-hash=unknown, git-branch=unknown
[2025-04-17 09:31:47,676] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2025-04-17 09:31:47,677] [INFO] [logging.py:96:log_dist] [Rank 0] Creating ZeRO Offload
[2025-04-17 09:31:47,828] [INFO] [utils.py:781:see_memory_usage] DeepSpeedZeRoOffload initialize [begin]
[2025-04-17 09:31:47,828] [INFO] [utils.py:782:see_memory_usage] MA 3.74 GB         Max_MA 6.55 GB         CA 6.98 GB         Max_CA 7 GB 
[2025-04-17 09:31:47,828] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 17.63 GB, percent = 2.3%
Parameter Offload: Total persistent parameters: 266240 in 65 params
[2025-04-17 09:31:47,949] [INFO] [utils.py:781:see_memory_usage] DeepSpeedZeRoOffload initialize [end]
[2025-04-17 09:31:47,950] [INFO] [utils.py:782:see_memory_usage] MA 3.74 GB         Max_MA 3.74 GB         CA 6.98 GB         Max_CA 7 GB 
[2025-04-17 09:31:47,950] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 17.66 GB, percent = 2.3%
[2025-04-17 09:31:47,951] [INFO] [config.py:997:print] DeepSpeedEngine configuration:
[2025-04-17 09:31:47,951] [INFO] [config.py:1001:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2025-04-17 09:31:47,951] [INFO] [config.py:1001:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2025-04-17 09:31:47,951] [INFO] [config.py:1001:print]   amp_enabled .................. False
[2025-04-17 09:31:47,951] [INFO] [config.py:1001:print]   amp_params ................... False
[2025-04-17 09:31:47,951] [INFO] [config.py:1001:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2025-04-17 09:31:47,951] [INFO] [config.py:1001:print]   bfloat16_enabled ............. True
[2025-04-17 09:31:47,951] [INFO] [config.py:1001:print]   bfloat16_immediate_grad_update  False
[2025-04-17 09:31:47,951] [INFO] [config.py:1001:print]   checkpoint_parallel_write_pipeline  False
[2025-04-17 09:31:47,951] [INFO] [config.py:1001:print]   checkpoint_tag_validation_enabled  True
[2025-04-17 09:31:47,951] [INFO] [config.py:1001:print]   checkpoint_tag_validation_fail  False
[2025-04-17 09:31:47,951] [INFO] [config.py:1001:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f12c0409dd0>
[2025-04-17 09:31:47,951] [INFO] [config.py:1001:print]   communication_data_type ...... None
[2025-04-17 09:31:47,951] [INFO] [config.py:1001:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2025-04-17 09:31:47,951] [INFO] [config.py:1001:print]   curriculum_enabled_legacy .... False
[2025-04-17 09:31:47,951] [INFO] [config.py:1001:print]   curriculum_params_legacy ..... False
[2025-04-17 09:31:47,951] [INFO] [config.py:1001:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2025-04-17 09:31:47,951] [INFO] [config.py:1001:print]   data_efficiency_enabled ...... False
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   dataloader_drop_last ......... False
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   disable_allgather ............ False
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   dump_state ................... False
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   dynamic_loss_scale_args ...... None
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   eigenvalue_enabled ........... False
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   eigenvalue_gas_boundary_resolution  1
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   eigenvalue_layer_num ......... 0
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   eigenvalue_max_iter .......... 100
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   eigenvalue_stability ......... 1e-06
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   eigenvalue_tol ............... 0.01
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   eigenvalue_verbose ........... False
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   elasticity_enabled ........... False
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   flops_profiler_config ........ {
    "enabled": false, 
    "recompute_fwd_factor": 0.0, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   fp16_auto_cast ............... None
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   fp16_enabled ................. False
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   fp16_master_weights_and_gradients  False
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   global_rank .................. 0
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   grad_accum_dtype ............. None
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   gradient_accumulation_steps .. 16
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   gradient_clipping ............ 1.0
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   gradient_predivide_factor .... 1.0
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   graph_harvesting ............. False
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   initial_dynamic_scale ........ 1
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   load_universal_checkpoint .... False
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   loss_scale ................... 1.0
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   memory_breakdown ............. False
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   mics_hierarchial_params_gather  False
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   mics_shard_size .............. -1
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') comet=CometConfig(enabled=False, samples_log_interval=100, project=None, workspace=None, api_key=None, experiment_name=None, experiment_key=None, online=None, mode=None) wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   optimizer_legacy_fusion ...... False
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   optimizer_name ............... None
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   optimizer_params ............. None
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True}
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   pld_enabled .................. False
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   pld_params ................... False
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   prescale_gradients ........... False
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   scheduler_name ............... None
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   scheduler_params ............. None
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   seq_parallel_communication_data_type  torch.float32
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   sparse_attention ............. None
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   sparse_gradients_enabled ..... False
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   steps_per_print .............. inf
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   timers_config ................ enabled=True synchronized=True
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   train_batch_size ............. 128
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   train_micro_batch_size_per_gpu  1
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   use_data_before_expert_parallel_  False
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   use_node_local_storage ....... False
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   wall_clock_breakdown ......... False
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   weight_quantization_config ... None
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   world_size ................... 8
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   zero_allow_untested_optimizer  False
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False, ratio=1.0) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=True use_all_reduce_for_fetch_params=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   zero_enabled ................. True
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   zero_force_ds_cpu_optimizer .. True
[2025-04-17 09:31:47,952] [INFO] [config.py:1001:print]   zero_optimization_stage ...... 3
[2025-04-17 09:31:47,953] [INFO] [config.py:987:print_user_config]   json = {
    "train_batch_size": 128, 
    "train_micro_batch_size_per_gpu": 1, 
    "gradient_accumulation_steps": 16, 
    "zero_optimization": {
        "stage": 3, 
        "offload_optimizer": {
            "device": "none", 
            "nvme_path": null
        }, 
        "offload_param": {
            "device": "none", 
            "nvme_path": null
        }, 
        "stage3_gather_16bit_weights_on_model_save": true
    }, 
    "gradient_clipping": 1.0, 
    "steps_per_print": inf, 
    "bf16": {
        "enabled": true
    }, 
    "fp16": {
        "enabled": false
    }, 
    "zero_optimization.reduce_bucket_size": 1.677722e+07, 
    "zero_optimization.stage3_param_persistence_threshold": 4.096000e+04, 
    "zero_optimization.stage3_prefetch_bucket_size": 1.509949e+07
}
Map:   0%|          | 0/11615 [00:00<?, ? examples/s]
Map:   0%|          | 0/11615 [00:00<?, ? examples/s]
Map:   0%|          | 0/11615 [00:00<?, ? examples/s]
Map:   0%|          | 0/11615 [00:00<?, ? examples/s]
Map:   0%|          | 0/11615 [00:00<?, ? examples/s]
Map:   0%|          | 0/11615 [00:00<?, ? examples/s]
Map:   0%|          | 0/11615 [00:00<?, ? examples/s][2025-04-17 09:31:48,108] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.14.4, git-hash=unknown, git-branch=unknown
[2025-04-17 09:31:48,114] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2025-04-17 09:31:48,115] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2025-04-17 09:31:48,115] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
[2025-04-17 09:31:48,123] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW
[2025-04-17 09:31:48,123] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=<class 'torch.optim.adamw.AdamW'>
[2025-04-17 09:31:48,123] [INFO] [logging.py:96:log_dist] [Rank 0] Creating fp16 ZeRO stage 3 optimizer, MiCS is enabled False, Hierarchical params gather False
[2025-04-17 09:31:48,123] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 3 optimizer
[WARNING|tokenization_utils_base.py:4119] 2025-04-17 09:31:48,159 >> Token indices sequence length is longer than the specified maximum sequence length for this model (2107 > 2048). Running this sequence through the model will result in indexing errors
[WARNING|tokenization_utils_base.py:4119] 2025-04-17 09:31:48,159 >> Token indices sequence length is longer than the specified maximum sequence length for this model (2107 > 2048). Running this sequence through the model will result in indexing errors
[WARNING|tokenization_utils_base.py:4119] 2025-04-17 09:31:48,160 >> Token indices sequence length is longer than the specified maximum sequence length for this model (2107 > 2048). Running this sequence through the model will result in indexing errors
[WARNING|tokenization_utils_base.py:4119] 2025-04-17 09:31:48,160 >> Token indices sequence length is longer than the specified maximum sequence length for this model (2107 > 2048). Running this sequence through the model will result in indexing errors
[WARNING|tokenization_utils_base.py:4119] 2025-04-17 09:31:48,161 >> Token indices sequence length is longer than the specified maximum sequence length for this model (2107 > 2048). Running this sequence through the model will result in indexing errors
[WARNING|tokenization_utils_base.py:4119] 2025-04-17 09:31:48,162 >> Token indices sequence length is longer than the specified maximum sequence length for this model (2107 > 2048). Running this sequence through the model will result in indexing errors
Map:   0%|          | 34/11615 [00:00<00:35, 327.89 examples/s]
Map:   0%|          | 34/11615 [00:00<00:35, 326.45 examples/s]
Map:   0%|          | 34/11615 [00:00<00:35, 330.17 examples/s]
Map:   0%|          | 34/11615 [00:00<00:35, 328.76 examples/s]
Map:   0%|          | 34/11615 [00:00<00:35, 327.07 examples/s]
Map:   0%|          | 34/11615 [00:00<00:35, 328.77 examples/s][WARNING|tokenization_utils_base.py:4119] 2025-04-17 09:31:48,193 >> Token indices sequence length is longer than the specified maximum sequence length for this model (2107 > 2048). Running this sequence through the model will result in indexing errors
Map:   0%|          | 33/11615 [00:00<00:37, 312.96 examples/s][2025-04-17 09:31:48,233] [INFO] [utils.py:781:see_memory_usage] Stage 3 initialize beginning
[2025-04-17 09:31:48,234] [INFO] [utils.py:782:see_memory_usage] MA 3.74 GB         Max_MA 3.74 GB         CA 5.02 GB         Max_CA 7 GB 
[2025-04-17 09:31:48,234] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 17.81 GB, percent = 2.4%
[2025-04-17 09:31:48,235] [INFO] [stage3.py:130:__init__] Reduce bucket size 500,000,000
[2025-04-17 09:31:48,235] [INFO] [stage3.py:131:__init__] Prefetch bucket size 50,000,000
Map:   1%|          | 70/11615 [00:00<00:33, 339.68 examples/s]
Map:   1%|          | 70/11615 [00:00<00:33, 339.99 examples/s]
Map:   1%|          | 70/11615 [00:00<00:33, 339.98 examples/s]
Map:   1%|          | 70/11615 [00:00<00:34, 335.08 examples/s]
Map:   1%|          | 70/11615 [00:00<00:34, 338.57 examples/s]
Map:   1%|          | 70/11615 [00:00<00:34, 339.48 examples/s]
Map:   1%|          | 68/11615 [00:00<00:34, 330.07 examples/s][2025-04-17 09:31:48,342] [INFO] [utils.py:781:see_memory_usage] DeepSpeedZeRoOffload initialize [begin]
[2025-04-17 09:31:48,342] [INFO] [utils.py:782:see_memory_usage] MA 3.74 GB         Max_MA 3.74 GB         CA 5.02 GB         Max_CA 5 GB 
[2025-04-17 09:31:48,342] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 17.81 GB, percent = 2.4%
Parameter Offload: Total persistent parameters: 266240 in 65 params
Map:   1%|          | 105/11615 [00:00<00:33, 341.06 examples/s]
Map:   1%|          | 105/11615 [00:00<00:33, 341.29 examples/s]
Map:   1%|          | 105/11615 [00:00<00:33, 340.85 examples/s]
Map:   1%|          | 104/11615 [00:00<00:34, 336.09 examples/s]
Map:   1%|          | 105/11615 [00:00<00:33, 339.63 examples/s]
Map:   1%|          | 105/11615 [00:00<00:33, 338.61 examples/s]
Map:   1%|          | 103/11615 [00:00<00:34, 332.73 examples/s][2025-04-17 09:31:48,465] [INFO] [utils.py:781:see_memory_usage] DeepSpeedZeRoOffload initialize [end]
[2025-04-17 09:31:48,466] [INFO] [utils.py:782:see_memory_usage] MA 3.74 GB         Max_MA 3.74 GB         CA 5.02 GB         Max_CA 5 GB 
[2025-04-17 09:31:48,466] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 17.81 GB, percent = 2.4%
Map:   1%|          | 145/11615 [00:00<00:31, 359.57 examples/s]
Map:   1%|          | 145/11615 [00:00<00:31, 360.04 examples/s]
Map:   1%|          | 145/11615 [00:00<00:31, 359.66 examples/s]
Map:   1%|          | 145/11615 [00:00<00:32, 358.12 examples/s]
Map:   1%|          | 143/11615 [00:00<00:32, 349.64 examples/s]
Map:   1%|          | 145/11615 [00:00<00:32, 357.71 examples/s]
Map:   1%|          | 141/11615 [00:00<00:32, 348.06 examples/s]
Map:   2%|▏         | 184/11615 [00:00<00:31, 366.50 examples/s]
Map:   2%|▏         | 184/11615 [00:00<00:31, 367.24 examples/s]
Map:   2%|▏         | 184/11615 [00:00<00:31, 366.86 examples/s][2025-04-17 09:31:48,576] [INFO] [utils.py:781:see_memory_usage] Before creating fp16 partitions
[2025-04-17 09:31:48,576] [INFO] [utils.py:782:see_memory_usage] MA 3.74 GB         Max_MA 3.74 GB         CA 5.02 GB         Max_CA 5 GB 
[2025-04-17 09:31:48,576] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 17.82 GB, percent = 2.4%
Map:   2%|▏         | 181/11615 [00:00<00:32, 356.42 examples/s]
Map:   2%|▏         | 184/11615 [00:00<00:31, 364.83 examples/s]
Map:   2%|▏         | 184/11615 [00:00<00:31, 364.93 examples/s]
Map:   2%|▏         | 180/11615 [00:00<00:37, 301.71 examples/s]
Map:   2%|▏         | 222/11615 [00:00<00:30, 368.76 examples/s]
Map:   2%|▏         | 220/11615 [00:00<00:31, 361.56 examples/s]
Map:   2%|▏         | 257/11615 [00:00<00:31, 357.27 examples/s]
Map:   2%|▏         | 230/11615 [00:00<00:37, 306.69 examples/s]
Map:   2%|▏         | 240/11615 [00:00<00:39, 289.15 examples/s]
Map:   2%|▏         | 240/11615 [00:00<00:39, 289.16 examples/s]
Map:   2%|▏         | 240/11615 [00:00<00:39, 291.39 examples/s]
Map:   2%|▏         | 240/11615 [00:00<00:39, 290.11 examples/s]
Map:   2%|▏         | 276/11615 [00:00<00:31, 361.82 examples/s]
Map:   3%|▎         | 294/11615 [00:00<00:31, 355.47 examples/s]
Map:   2%|▏         | 267/11615 [00:00<00:35, 321.06 examples/s]
Map:   2%|▏         | 276/11615 [00:00<00:37, 304.40 examples/s]
Map:   2%|▏         | 276/11615 [00:00<00:37, 303.96 examples/s]
Map:   2%|▏         | 276/11615 [00:00<00:37, 305.35 examples/s]
Map:   2%|▏         | 276/11615 [00:00<00:37, 304.38 examples/s]
Map:   3%|▎         | 313/11615 [00:00<00:31, 358.71 examples/s]
Map:   3%|▎         | 330/11615 [00:00<00:31, 353.72 examples/s]
Map:   3%|▎         | 303/11615 [00:00<00:34, 328.22 examples/s]
Map:   3%|▎         | 312/11615 [00:00<00:35, 316.91 examples/s]
Map:   3%|▎         | 312/11615 [00:00<00:35, 316.53 examples/s]
Map:   3%|▎         | 312/11615 [00:00<00:35, 317.15 examples/s]
Map:   3%|▎         | 312/11615 [00:00<00:35, 316.47 examples/s]
Map:   3%|▎         | 353/11615 [00:00<00:30, 366.94 examples/s]
Map:   3%|▎         | 372/11615 [00:01<00:30, 370.35 examples/s]
Map:   3%|▎         | 340/11615 [00:01<00:33, 336.98 examples/s]
Map:   3%|▎         | 351/11615 [00:01<00:33, 334.45 examples/s]
Map:   3%|▎         | 351/11615 [00:01<00:33, 333.75 examples/s]
Map:   3%|▎         | 351/11615 [00:01<00:33, 333.98 examples/s]
Map:   3%|▎         | 351/11615 [00:01<00:33, 333.38 examples/s]
Map:   3%|▎         | 393/11615 [00:01<00:30, 371.11 examples/s]
Map:   4%|▎         | 410/11615 [00:01<00:30, 367.56 examples/s]
Map:   3%|▎         | 391/11615 [00:01<00:32, 348.59 examples/s]
Map:   3%|▎         | 380/11615 [00:01<00:32, 349.13 examples/s]
Map:   3%|▎         | 390/11615 [00:01<00:32, 347.75 examples/s]
Map:   3%|▎         | 390/11615 [00:01<00:32, 346.89 examples/s]
Map:   3%|▎         | 391/11615 [00:01<00:32, 347.69 examples/s]
Map:   4%|▎         | 432/11615 [00:01<00:29, 373.10 examples/s]
Map:   4%|▍         | 453/11615 [00:01<00:29, 384.42 examples/s]
Map:   4%|▎         | 428/11615 [00:01<00:31, 355.64 examples/s]
Map:   4%|▎         | 429/11615 [00:01<00:31, 357.58 examples/s]
Map:   4%|▎         | 419/11615 [00:01<00:31, 355.32 examples/s]
Map:   4%|▎         | 431/11615 [00:01<00:31, 357.07 examples/s]
Map:   4%|▎         | 431/11615 [00:01<00:31, 356.51 examples/s]
Map:   4%|▍         | 474/11615 [00:01<00:28, 385.62 examples/s]
Map:   4%|▍         | 470/11615 [00:01<00:30, 369.33 examples/s]
Map:   4%|▍         | 470/11615 [00:01<00:30, 368.89 examples/s]
Map:   4%|▍         | 462/11615 [00:01<00:29, 374.82 examples/s]
Map:   4%|▍         | 473/11615 [00:01<00:29, 372.53 examples/s]
Map:   4%|▍         | 474/11615 [00:01<00:29, 373.86 examples/s]
Map:   4%|▍         | 507/11615 [00:01<00:29, 371.25 examples/s]
Map:   5%|▍         | 526/11615 [00:01<00:30, 367.41 examples/s]
Map:   4%|▍         | 522/11615 [00:01<00:30, 358.64 examples/s]
Map:   4%|▍         | 512/11615 [00:01<00:30, 358.24 examples/s]
Map:   4%|▍         | 522/11615 [00:01<00:30, 358.27 examples/s]
Map:   5%|▍         | 524/11615 [00:01<00:30, 359.03 examples/s]
Map:   5%|▍         | 526/11615 [00:01<00:30, 360.25 examples/s]
Map:   5%|▍         | 560/11615 [00:01<00:30, 357.28 examples/s]
Map:   5%|▍         | 579/11615 [00:01<00:30, 358.67 examples/s]
Map:   5%|▍         | 550/11615 [00:01<00:30, 358.21 examples/s]
Map:   5%|▌         | 597/11615 [00:01<00:30, 359.66 examples/s]
Map:   5%|▍         | 575/11615 [00:01<00:31, 350.95 examples/s]
Map:   5%|▍         | 575/11615 [00:01<00:31, 350.68 examples/s]
Map:   5%|▍         | 579/11615 [00:01<00:31, 353.28 examples/s]
Map:   5%|▍         | 579/11615 [00:01<00:31, 353.98 examples/s]
Map:   5%|▌         | 619/11615 [00:01<00:29, 366.96 examples/s]
Map:   5%|▌         | 638/11615 [00:01<00:29, 369.48 examples/s]
Map:   5%|▌         | 615/11615 [00:01<00:30, 361.66 examples/s]
Map:   5%|▌         | 604/11615 [00:01<00:30, 355.81 examples/s]
Map:   5%|▌         | 615/11615 [00:01<00:30, 361.18 examples/s]
Map:   5%|▌         | 619/11615 [00:01<00:30, 361.91 examples/s]
Map:   5%|▌         | 619/11615 [00:01<00:30, 362.82 examples/s]
Map:   6%|▌         | 659/11615 [00:01<00:29, 370.79 examples/s]
Map:   6%|▌         | 676/11615 [00:01<00:29, 367.93 examples/s]
Map:   6%|▌         | 653/11615 [00:01<00:30, 363.71 examples/s]
Map:   6%|▌         | 653/11615 [00:01<00:30, 363.13 examples/s]
Map:   6%|▌         | 644/11615 [00:01<00:30, 362.71 examples/s]
Map:   6%|▌         | 658/11615 [00:01<00:29, 367.42 examples/s]
Map:   6%|▌         | 659/11615 [00:01<00:29, 368.60 examples/s]
Map:   6%|▌         | 711/11615 [00:01<00:30, 359.17 examples/s]
Map:   6%|▌         | 691/11615 [00:01<00:29, 364.20 examples/s]
Map:   6%|▌         | 691/11615 [00:01<00:30, 363.98 examples/s]
Map:   6%|▋         | 730/11615 [00:02<00:30, 358.46 examples/s]
Map:   6%|▌         | 711/11615 [00:02<00:30, 358.35 examples/s]
Map:   6%|▌         | 710/11615 [00:02<00:30, 354.21 examples/s]
Map:   6%|▌         | 700/11615 [00:02<00:30, 358.63 examples/s]
Map:   6%|▋         | 750/11615 [00:02<00:30, 358.83 examples/s]
Map:   6%|▋         | 728/11615 [00:02<00:30, 359.90 examples/s]
Map:   7%|▋         | 770/11615 [00:02<00:30, 361.22 examples/s]
Map:   6%|▋         | 744/11615 [00:02<00:30, 358.05 examples/s]
Map:   6%|▋         | 737/11615 [00:02<00:30, 358.81 examples/s]
Map:   6%|▋         | 750/11615 [00:02<00:30, 358.51 examples/s]
Map:   6%|▋         | 750/11615 [00:02<00:30, 355.26 examples/s]
Map:   7%|▋         | 790/11615 [00:02<00:29, 362.00 examples/s]
Map:   7%|▋         | 810/11615 [00:02<00:29, 366.05 examples/s]
Map:   7%|▋         | 782/11615 [00:02<00:30, 357.85 examples/s]
Map:   7%|▋         | 774/11615 [00:02<00:30, 357.51 examples/s]
Map:   7%|▋         | 790/11615 [00:02<00:29, 361.62 examples/s]
Map:   7%|▋         | 790/11615 [00:02<00:30, 359.37 examples/s]
Map:   7%|▋         | 829/11615 [00:02<00:29, 366.75 examples/s]
Map:   7%|▋         | 800/11615 [00:02<00:30, 359.14 examples/s]
Map:   7%|▋         | 847/11615 [00:02<00:29, 363.61 examples/s]
Map:   7%|▋         | 820/11615 [00:02<00:29, 359.98 examples/s]
Map:   7%|▋         | 813/11615 [00:02<00:29, 364.52 examples/s]
Map:   7%|▋         | 829/11615 [00:02<00:29, 366.31 examples/s]
Map:   7%|▋         | 829/11615 [00:02<00:29, 365.01 examples/s]
Map:   7%|▋         | 866/11615 [00:02<00:29, 362.69 examples/s]
Map:   7%|▋         | 840/11615 [00:02<00:29, 362.84 examples/s]
Map:   7%|▋         | 858/11615 [00:02<00:29, 359.37 examples/s]
Map:   7%|▋         | 866/11615 [00:02<00:29, 361.33 examples/s]
Map:   8%|▊         | 901/11615 [00:02<00:29, 357.24 examples/s]
Map:   7%|▋         | 867/11615 [00:02<00:29, 360.48 examples/s]
Map:   8%|▊         | 882/11615 [00:02<00:30, 355.40 examples/s]
Map:   8%|▊         | 922/11615 [00:02<00:29, 362.00 examples/s]
Map:   8%|▊         | 892/11615 [00:02<00:30, 354.28 examples/s]
Map:   8%|▊         | 941/11615 [00:02<00:29, 363.91 examples/s]
Map:   8%|▊         | 911/11615 [00:02<00:30, 355.79 examples/s]
Map:   8%|▊         | 922/11615 [00:02<00:29, 362.07 examples/s]
Map:   8%|▊         | 922/11615 [00:02<00:29, 361.29 examples/s]
Map:   8%|▊         | 960/11615 [00:02<00:29, 362.45 examples/s]
Map:   8%|▊         | 922/11615 [00:02<00:29, 359.21 examples/s]
Map:   8%|▊         | 930/11615 [00:02<00:30, 355.19 examples/s]
Map:   8%|▊         | 979/11615 [00:02<00:29, 364.15 examples/s]
Map:   8%|▊         | 950/11615 [00:02<00:29, 356.55 examples/s]
Map:   8%|▊         | 960/11615 [00:02<00:29, 361.70 examples/s]
Map:   8%|▊         | 960/11615 [00:02<00:29, 361.88 examples/s]
Map:   9%|▊         | 998/11615 [00:02<00:29, 363.95 examples/s]
Map:   8%|▊         | 960/11615 [00:02<00:29, 359.62 examples/s]
Map:   8%|▊         | 970/11615 [00:02<00:29, 358.42 examples/s]
Map:   9%|▊         | 989/11615 [00:02<00:29, 360.70 examples/s]
Map:   9%|▊         | 998/11615 [00:02<00:29, 363.51 examples/s]
Map:   9%|▊         | 998/11615 [00:02<00:29, 362.75 examples/s]
Map:   9%|▉         | 1036/11615 [00:02<00:28, 365.27 examples/s]
Map:   9%|▉         | 1036/11615 [00:02<00:28, 364.82 examples/s]
Map:   9%|▊         | 998/11615 [00:02<00:29, 360.94 examples/s]
Map:   9%|▉         | 1026/11615 [00:02<00:29, 361.09 examples/s]
Map:   9%|▉         | 1026/11615 [00:02<00:29, 361.01 examples/s]
Map:   9%|▉         | 1036/11615 [00:02<00:28, 365.67 examples/s]
Map:   9%|▉         | 1036/11615 [00:02<00:29, 364.63 examples/s]
Map:   9%|▉         | 1036/11615 [00:02<00:29, 362.51 examples/s]
Map:   9%|▉         | 1088/11615 [00:03<00:30, 349.32 examples/s]
Map:   9%|▉         | 1088/11615 [00:03<00:30, 350.74 examples/s]
Map:   9%|▉         | 1075/11615 [00:03<00:30, 342.57 examples/s]
Map:   9%|▉         | 1079/11615 [00:03<00:30, 350.56 examples/s]
Map:   9%|▉         | 1088/11615 [00:03<00:30, 349.70 examples/s]
Map:   9%|▉         | 1088/11615 [00:03<00:30, 349.31 examples/s]
Map:  10%|▉         | 1128/11615 [00:03<00:29, 361.31 examples/s]
Map:  10%|▉         | 1128/11615 [00:03<00:28, 361.85 examples/s]
Map:   9%|▉         | 1088/11615 [00:03<00:30, 347.14 examples/s]
Map:  10%|▉         | 1114/11615 [00:03<00:29, 353.49 examples/s]
Map:  10%|▉         | 1118/11615 [00:03<00:29, 355.96 examples/s]
Map:  10%|▉         | 1128/11615 [00:03<00:28, 362.20 examples/s]
Map:  10%|▉         | 1128/11615 [00:03<00:28, 361.92 examples/s]
Map:  10%|█         | 1165/11615 [00:03<00:29, 359.56 examples/s]
Map:  10%|█         | 1165/11615 [00:03<00:28, 361.15 examples/s]
Map:  10%|▉         | 1128/11615 [00:03<00:29, 359.45 examples/s]
Map:  10%|▉         | 1153/11615 [00:03<00:28, 361.16 examples/s]
Map:  10%|▉         | 1157/11615 [00:03<00:28, 360.98 examples/s]
Map:  10%|█         | 1165/11615 [00:03<00:28, 361.33 examples/s]
Map:  10%|█         | 1165/11615 [00:03<00:28, 360.84 examples/s]
Map:  10%|█         | 1204/11615 [00:03<00:28, 365.21 examples/s]
Map:  10%|█         | 1204/11615 [00:03<00:28, 367.67 examples/s]
Map:  10%|█         | 1165/11615 [00:03<00:29, 358.37 examples/s]
Map:  10%|█         | 1190/11615 [00:03<00:29, 357.22 examples/s]
Map:  10%|█         | 1194/11615 [00:03<00:28, 360.13 examples/s]
Map:  10%|█         | 1204/11615 [00:03<00:28, 368.66 examples/s]
Map:  10%|█         | 1204/11615 [00:03<00:28, 367.69 examples/s]
Map:  11%|█         | 1242/11615 [00:03<00:28, 365.94 examples/s]
Map:  11%|█         | 1242/11615 [00:03<00:28, 369.12 examples/s]
Map:  10%|█         | 1204/11615 [00:03<00:28, 365.14 examples/s]
Map:  11%|█         | 1230/11615 [00:03<00:28, 364.79 examples/s]
Map:  11%|█         | 1235/11615 [00:03<00:27, 371.85 examples/s]
Map:  11%|█         | 1243/11615 [00:03<00:27, 371.39 examples/s]
Map:  11%|█         | 1242/11615 [00:03<00:28, 368.85 examples/s]
Map:  11%|█         | 1280/11615 [00:03<00:28, 363.18 examples/s]
Map:  11%|█         | 1280/11615 [00:03<00:28, 366.36 examples/s]
Map:  11%|█         | 1242/11615 [00:03<00:28, 366.42 examples/s]
Map:  11%|█         | 1270/11615 [00:03<00:28, 363.89 examples/s]
Map:  11%|█         | 1280/11615 [00:03<00:28, 365.74 examples/s]
Map:  11%|█▏        | 1318/11615 [00:03<00:28, 364.80 examples/s]
Map:  11%|█▏        | 1319/11615 [00:03<00:28, 366.88 examples/s]
Map:  11%|█         | 1289/11615 [00:03<00:28, 365.99 examples/s]
Map:  11%|█         | 1280/11615 [00:03<00:28, 363.36 examples/s]
Map:  11%|█         | 1300/11615 [00:03<00:27, 369.71 examples/s]
Map:  11%|█▏        | 1308/11615 [00:03<00:28, 364.47 examples/s]
Map:  11%|█▏        | 1319/11615 [00:03<00:28, 366.42 examples/s]
Map:  11%|█▏        | 1318/11615 [00:03<00:28, 364.94 examples/s]
Map:  12%|█▏        | 1370/11615 [00:03<00:28, 353.60 examples/s]
Map:  12%|█▏        | 1371/11615 [00:03<00:28, 356.59 examples/s]
Map:  12%|█▏        | 1342/11615 [00:03<00:28, 359.01 examples/s]
Map:  12%|█▏        | 1354/11615 [00:03<00:28, 361.78 examples/s]
Map:  12%|█▏        | 1360/11615 [00:03<00:29, 352.60 examples/s]
Map:  12%|█▏        | 1371/11615 [00:03<00:28, 356.85 examples/s]
Map:  12%|█▏        | 1406/11615 [00:03<00:28, 353.88 examples/s]
Map:  12%|█▏        | 1407/11615 [00:03<00:28, 356.42 examples/s]
Map:  12%|█▏        | 1370/11615 [00:03<00:29, 353.09 examples/s]
Map:  12%|█▏        | 1395/11615 [00:03<00:29, 351.28 examples/s]
Map:  12%|█▏        | 1407/11615 [00:03<00:28, 358.16 examples/s]
Map:  12%|█▏        | 1407/11615 [00:03<00:28, 356.77 examples/s]
Map:  12%|█▏        | 1406/11615 [00:03<00:28, 352.64 examples/s]
Map:  12%|█▏        | 1414/11615 [00:04<00:28, 351.97 examples/s]
Map:  13%|█▎        | 1460/11615 [00:04<00:29, 348.56 examples/s]
Map:  13%|█▎        | 1460/11615 [00:04<00:28, 351.00 examples/s]
Map:  12%|█▏        | 1431/11615 [00:04<00:29, 348.55 examples/s]
Map:  13%|█▎        | 1460/11615 [00:04<00:28, 352.66 examples/s]
Map:  13%|█▎        | 1460/11615 [00:04<00:28, 350.86 examples/s]
Map:  13%|█▎        | 1500/11615 [00:04<00:28, 351.03 examples/s]
Map:  13%|█▎        | 1500/11615 [00:04<00:28, 352.69 examples/s]
Map:  13%|█▎        | 1469/11615 [00:04<00:28, 352.56 examples/s]
Map:  13%|█▎        | 1460/11615 [00:04<00:29, 346.18 examples/s]
Map:  13%|█▎        | 1469/11615 [00:04<00:28, 350.12 examples/s]
Map:  13%|█▎        | 1500/11615 [00:04<00:28, 353.77 examples/s]
Map:  13%|█▎        | 1500/11615 [00:04<00:28, 353.01 examples/s]
Map:  13%|█▎        | 1539/11615 [00:04<00:28, 356.55 examples/s]
Map:  13%|█▎        | 1539/11615 [00:04<00:28, 357.74 examples/s]
Map:  13%|█▎        | 1499/11615 [00:04<00:28, 351.15 examples/s]
Map:  13%|█▎        | 1522/11615 [00:04<00:28, 350.03 examples/s]
Map:  13%|█▎        | 1521/11615 [00:04<00:28, 348.13 examples/s]
Map:  13%|█▎        | 1539/11615 [00:04<00:28, 358.47 examples/s]
Map:  13%|█▎        | 1539/11615 [00:04<00:28, 358.02 examples/s]
Map:  14%|█▎        | 1577/11615 [00:04<00:27, 359.78 examples/s]
Map:  14%|█▎        | 1577/11615 [00:04<00:27, 359.79 examples/s]
Map:  13%|█▎        | 1536/11615 [00:04<00:28, 351.78 examples/s]
Map:  13%|█▎        | 1560/11615 [00:04<00:28, 355.13 examples/s]
Map:  13%|█▎        | 1560/11615 [00:04<00:28, 354.10 examples/s]
Map:  14%|█▎        | 1577/11615 [00:04<00:27, 361.06 examples/s]
Map:  14%|█▎        | 1577/11615 [00:04<00:27, 360.25 examples/s]
Map:  14%|█▍        | 1614/11615 [00:04<00:27, 359.24 examples/s]
Map:  14%|█▍        | 1614/11615 [00:04<00:27, 358.79 examples/s]
Map:  14%|█▎        | 1574/11615 [00:04<00:28, 357.42 examples/s]
Map:  14%|█▍        | 1598/11615 [00:04<00:27, 358.94 examples/s]
Map:  14%|█▍        | 1598/11615 [00:04<00:27, 357.85 examples/s]
Map:  14%|█▍        | 1614/11615 [00:04<00:27, 359.92 examples/s]
Map:  14%|█▍        | 1614/11615 [00:04<00:27, 359.48 examples/s]
Map:  14%|█▍        | 1654/11615 [00:04<00:27, 367.50 examples/s]
Map:  14%|█▍        | 1654/11615 [00:04<00:27, 367.14 examples/s]
Map:  14%|█▍        | 1610/11615 [00:04<00:28, 352.99 examples/s]
Map:  14%|█▍        | 1635/11615 [00:04<00:27, 360.98 examples/s]
Map:  14%|█▍        | 1635/11615 [00:04<00:27, 358.85 examples/s]
Map:  14%|█▍        | 1654/11615 [00:04<00:27, 367.78 examples/s]
Map:  14%|█▍        | 1654/11615 [00:04<00:27, 367.68 examples/s]
Map:  15%|█▍        | 1692/11615 [00:04<00:26, 369.26 examples/s]
Map:  15%|█▍        | 1692/11615 [00:04<00:26, 368.77 examples/s]
Map:  14%|█▍        | 1650/11615 [00:04<00:27, 364.81 examples/s]
Map:  15%|█▍        | 1692/11615 [00:04<00:26, 369.47 examples/s]
Map:  15%|█▍        | 1692/11615 [00:04<00:26, 369.30 examples/s]
Map:  15%|█▍        | 1729/11615 [00:04<00:27, 364.72 examples/s]
Map:  15%|█▍        | 1692/11615 [00:04<00:27, 363.84 examples/s]
Map:  15%|█▍        | 1692/11615 [00:04<00:27, 362.20 examples/s]
Map:  15%|█▍        | 1688/11615 [00:04<00:27, 367.08 examples/s]
Map:  15%|█▌        | 1750/11615 [00:04<00:26, 367.34 examples/s]
Map:  15%|█▌        | 1766/11615 [00:04<00:27, 358.88 examples/s]
Map:  15%|█▌        | 1750/11615 [00:04<00:26, 367.91 examples/s]
Map:  15%|█▌        | 1750/11615 [00:04<00:26, 367.71 examples/s]
Map:  15%|█▌        | 1749/11615 [00:04<00:27, 363.10 examples/s]
Map:  15%|█▌        | 1749/11615 [00:04<00:27, 365.09 examples/s]
Map:  15%|█▍        | 1742/11615 [00:04<00:27, 360.16 examples/s]
Map:  16%|█▌        | 1801/11615 [00:04<00:27, 353.94 examples/s]
Map:  16%|█▌        | 1819/11615 [00:05<00:27, 353.35 examples/s]
Map:  16%|█▌        | 1801/11615 [00:05<00:27, 353.93 examples/s]
Map:  16%|█▌        | 1801/11615 [00:05<00:27, 353.89 examples/s]
Map:  16%|█▌        | 1837/11615 [00:05<00:27, 351.49 examples/s]
Map:  15%|█▌        | 1799/11615 [00:05<00:28, 350.55 examples/s]
Map:  15%|█▌        | 1799/11615 [00:05<00:27, 350.92 examples/s]
Map:  15%|█▌        | 1793/11615 [00:05<00:28, 348.56 examples/s]
Map:  16%|█▌        | 1837/11615 [00:05<00:27, 352.10 examples/s]
Map:  16%|█▌        | 1837/11615 [00:05<00:27, 351.75 examples/s]
Map:  16%|█▌        | 1877/11615 [00:05<00:27, 360.43 examples/s]
Map:  16%|█▌        | 1873/11615 [00:05<00:27, 352.76 examples/s]
Map:  16%|█▌        | 1830/11615 [00:05<00:27, 351.73 examples/s]
Map:  16%|█▌        | 1853/11615 [00:05<00:27, 350.17 examples/s]
Map:  16%|█▌        | 1854/11615 [00:05<00:27, 350.31 examples/s]
Map:  16%|█▌        | 1877/11615 [00:05<00:26, 361.23 examples/s]
Map:  16%|█▌        | 1877/11615 [00:05<00:26, 360.72 examples/s]
Map:  16%|█▋        | 1910/11615 [00:05<00:27, 350.39 examples/s]
Map:  16%|█▌        | 1867/11615 [00:05<00:27, 356.07 examples/s]
Map:  17%|█▋        | 1931/11615 [00:05<00:27, 357.35 examples/s]
Map:  16%|█▋        | 1890/11615 [00:05<00:27, 348.92 examples/s]
Map:  16%|█▋        | 1890/11615 [00:05<00:27, 349.07 examples/s]
Map:  17%|█▋        | 1948/11615 [00:05<00:27, 356.61 examples/s]
Map:  17%|█▋        | 1931/11615 [00:05<00:27, 358.23 examples/s]
Map:  17%|█▋        | 1931/11615 [00:05<00:27, 358.12 examples/s]
Map:  17%|█▋        | 1971/11615 [00:05<00:26, 362.02 examples/s]
Map:  17%|█▋        | 1922/11615 [00:05<00:27, 356.38 examples/s]
Map:  17%|█▋        | 1930/11615 [00:05<00:27, 356.42 examples/s]
Map:  17%|█▋        | 1930/11615 [00:05<00:27, 356.79 examples/s]
Map:  17%|█▋        | 1986/11615 [00:05<00:26, 361.63 examples/s]
Map:  17%|█▋        | 1971/11615 [00:05<00:26, 362.45 examples/s]
Map:  17%|█▋        | 1971/11615 [00:05<00:26, 362.83 examples/s]
Map:  17%|█▋        | 1967/11615 [00:05<00:26, 359.31 examples/s]
Map:  17%|█▋        | 1967/11615 [00:05<00:26, 359.59 examples/s]
Map:  17%|█▋        | 1961/11615 [00:05<00:26, 359.40 examples/s]
Map:  17%|█▋        | 2023/11615 [00:05<00:27, 352.62 examples/s]
Map:  17%|█▋        | 2010/11615 [00:05<00:26, 363.16 examples/s]
Map:  17%|█▋        | 2010/11615 [00:05<00:26, 363.50 examples/s]
Map:  18%|█▊        | 2038/11615 [00:05<00:27, 347.94 examples/s]
Map:  17%|█▋        | 2004/11615 [00:05<00:26, 358.95 examples/s]
Map:  17%|█▋        | 2004/11615 [00:05<00:26, 359.71 examples/s]
Map:  17%|█▋        | 2000/11615 [00:05<00:26, 362.56 examples/s]
Map:  18%|█▊        | 2060/11615 [00:05<00:27, 351.68 examples/s]
Map:  18%|█▊        | 2077/11615 [00:05<00:27, 352.35 examples/s]
Map:  18%|█▊        | 2063/11615 [00:05<00:26, 357.08 examples/s]
Map:  18%|█▊        | 2063/11615 [00:05<00:26, 356.81 examples/s]
Map:  18%|█▊        | 2050/11615 [00:05<00:27, 349.55 examples/s]
Map:  18%|█▊        | 2060/11615 [00:05<00:26, 354.05 examples/s]
Map:  18%|█▊        | 2060/11615 [00:05<00:26, 354.57 examples/s]
Map:  18%|█▊        | 2114/11615 [00:05<00:27, 349.97 examples/s]
Map:  18%|█▊        | 2114/11615 [00:05<00:26, 353.38 examples/s]
Map:  18%|█▊        | 2099/11615 [00:05<00:26, 354.97 examples/s]
Map:  18%|█▊        | 2089/11615 [00:05<00:26, 353.44 examples/s]
Map:  18%|█▊        | 2117/11615 [00:05<00:26, 355.70 examples/s]
Map:  19%|█▊        | 2153/11615 [00:05<00:26, 358.70 examples/s]
Map:  19%|█▊        | 2153/11615 [00:05<00:26, 361.38 examples/s]
Map:  18%|█▊        | 2114/11615 [00:05<00:27, 349.80 examples/s]
Map:  18%|█▊        | 2114/11615 [00:05<00:27, 351.20 examples/s]
Map:  18%|█▊        | 2137/11615 [00:06<00:26, 359.22 examples/s]
Map:  18%|█▊        | 2125/11615 [00:06<00:26, 354.45 examples/s]
Map:  19%|█▊        | 2157/11615 [00:06<00:26, 363.33 examples/s]
Map:  19%|█▉        | 2190/11615 [00:06<00:26, 355.38 examples/s]
Map:  19%|█▉        | 2190/11615 [00:06<00:26, 356.23 examples/s]
Map:  19%|█▊        | 2153/11615 [00:06<00:26, 356.88 examples/s]
Map:  19%|█▊        | 2153/11615 [00:06<00:26, 358.35 examples/s]
Map:  19%|█▊        | 2174/11615 [00:06<00:26, 360.89 examples/s]
Map:  19%|█▊        | 2163/11615 [00:06<00:26, 359.52 examples/s]
Map:  19%|█▉        | 2229/11615 [00:06<00:26, 358.93 examples/s]
Map:  19%|█▉        | 2229/11615 [00:06<00:26, 359.25 examples/s]
Map:  19%|█▉        | 2190/11615 [00:06<00:26, 352.85 examples/s]
Map:  19%|█▉        | 2211/11615 [00:06<00:26, 361.08 examples/s]
Map:  19%|█▉        | 2190/11615 [00:06<00:26, 354.09 examples/s]
Map:  19%|█▉        | 2211/11615 [00:06<00:26, 357.87 examples/s]
Map:  19%|█▉        | 2200/11615 [00:06<00:26, 356.33 examples/s]
Map:  20%|█▉        | 2268/11615 [00:06<00:25, 360.50 examples/s]
Map:  20%|█▉        | 2268/11615 [00:06<00:25, 360.19 examples/s]
Map:  19%|█▉        | 2250/11615 [00:06<00:25, 362.65 examples/s]
Map:  19%|█▉        | 2229/11615 [00:06<00:26, 355.84 examples/s]
Map:  19%|█▉        | 2229/11615 [00:06<00:26, 356.90 examples/s]
Map:  19%|█▉        | 2250/11615 [00:06<00:26, 359.43 examples/s]
Map:  19%|█▉        | 2237/11615 [00:06<00:26, 357.86 examples/s]
Map:  20%|█▉        | 2306/11615 [00:06<00:25, 362.58 examples/s]
Map:  20%|█▉        | 2305/11615 [00:06<00:25, 361.75 examples/s]
Map:  20%|█▉        | 2268/11615 [00:06<00:26, 357.06 examples/s]
Map:  20%|█▉        | 2287/11615 [00:06<00:25, 360.41 examples/s]
Map:  20%|█▉        | 2268/11615 [00:06<00:26, 357.98 examples/s]
Map:  20%|█▉        | 2273/11615 [00:06<00:26, 354.84 examples/s]
Map:  20%|█▉        | 2305/11615 [00:06<00:25, 362.06 examples/s]
Map:  20%|██        | 2343/11615 [00:06<00:25, 360.45 examples/s]
Map:  20%|█▉        | 2305/11615 [00:06<00:25, 359.64 examples/s]
Map:  20%|█▉        | 2305/11615 [00:06<00:25, 360.01 examples/s]
Map:  20%|██        | 2325/11615 [00:06<00:25, 362.02 examples/s]
Map:  20%|██        | 2361/11615 [00:06<00:25, 363.81 examples/s]
Map:  20%|█▉        | 2311/11615 [00:06<00:25, 359.40 examples/s]
Map:  21%|██        | 2382/11615 [00:06<00:25, 366.46 examples/s]
Map:  20%|██        | 2362/11615 [00:06<00:25, 365.51 examples/s]
Map:  20%|██        | 2365/11615 [00:06<00:25, 368.99 examples/s]
Map:  20%|██        | 2349/11615 [00:06<00:25, 363.28 examples/s]
Map:  20%|██        | 2361/11615 [00:06<00:25, 360.78 examples/s]
Map:  20%|██        | 2361/11615 [00:06<00:25, 361.53 examples/s]
Map:  21%|██        | 2416/11615 [00:06<00:25, 360.14 examples/s]
Map:  21%|██        | 2437/11615 [00:06<00:25, 366.06 examples/s]
Map:  21%|██        | 2416/11615 [00:06<00:25, 361.40 examples/s]
Map:  21%|██        | 2420/11615 [00:06<00:25, 359.37 examples/s]
Map:  21%|██        | 2457/11615 [00:06<00:24, 370.84 examples/s]
Map:  21%|██        | 2402/11615 [00:06<00:25, 355.92 examples/s]
Map:  21%|██        | 2416/11615 [00:06<00:25, 357.20 examples/s]
Map:  21%|██        | 2416/11615 [00:06<00:25, 357.87 examples/s]
Map:  21%|██▏       | 2476/11615 [00:06<00:24, 371.70 examples/s]
Map:  21%|██        | 2457/11615 [00:06<00:24, 371.74 examples/s]
Map:  21%|██        | 2461/11615 [00:06<00:24, 368.88 examples/s]
Map:  21%|██        | 2442/11615 [00:06<00:25, 364.07 examples/s]
Map:  21%|██        | 2456/11615 [00:06<00:24, 366.99 examples/s]
Map:  21%|██        | 2457/11615 [00:06<00:24, 367.67 examples/s]
Map:  22%|██▏       | 2512/11615 [00:06<00:24, 367.26 examples/s]
Map:  22%|██▏       | 2499/11615 [00:06<00:24, 369.12 examples/s]
Map:  22%|██▏       | 2530/11615 [00:07<00:24, 363.82 examples/s]
Map:  21%|██▏       | 2482/11615 [00:06<00:24, 370.98 examples/s]
Map:  22%|██▏       | 2512/11615 [00:07<00:24, 367.83 examples/s]
Map:  22%|██▏       | 2550/11615 [00:07<00:24, 366.04 examples/s]
Map:  22%|██▏       | 2511/11615 [00:07<00:25, 362.79 examples/s]
Map:  22%|██▏       | 2512/11615 [00:07<00:25, 363.37 examples/s]
Map:  22%|██▏       | 2538/11615 [00:07<00:24, 370.76 examples/s]
Map:  22%|██▏       | 2569/11615 [00:07<00:24, 366.04 examples/s]
Map:  22%|██▏       | 2520/11615 [00:07<00:24, 365.15 examples/s]
Map:  22%|██▏       | 2550/11615 [00:07<00:24, 366.82 examples/s]
Map:  22%|██▏       | 2550/11615 [00:07<00:25, 362.51 examples/s]
Map:  22%|██▏       | 2550/11615 [00:07<00:25, 362.01 examples/s]
Map:  22%|██▏       | 2604/11615 [00:07<00:25, 359.51 examples/s]
Map:  22%|██▏       | 2560/11615 [00:07<00:24, 366.25 examples/s]
Map:  22%|██▏       | 2590/11615 [00:07<00:25, 357.98 examples/s]
Map:  23%|██▎       | 2622/11615 [00:07<00:25, 355.00 examples/s]
Map:  22%|██▏       | 2604/11615 [00:07<00:25, 360.06 examples/s]
Map:  22%|██▏       | 2603/11615 [00:07<00:25, 354.76 examples/s]
Map:  22%|██▏       | 2603/11615 [00:07<00:25, 354.80 examples/s]
Map:  23%|██▎       | 2629/11615 [00:07<00:25, 358.83 examples/s]
Map:  23%|██▎       | 2660/11615 [00:07<00:25, 355.48 examples/s]
Map:  23%|██▎       | 2660/11615 [00:07<00:25, 357.99 examples/s]
Map:  22%|██▏       | 2613/11615 [00:07<00:25, 357.76 examples/s]
Map:  23%|██▎       | 2660/11615 [00:07<00:25, 357.92 examples/s]
Map:  23%|██▎       | 2639/11615 [00:07<00:25, 354.86 examples/s]
Map:  23%|██▎       | 2639/11615 [00:07<00:25, 354.27 examples/s]
Map:  23%|██▎       | 2700/11615 [00:07<00:24, 361.02 examples/s]
Map:  23%|██▎       | 2700/11615 [00:07<00:24, 362.72 examples/s]
Map:  23%|██▎       | 2682/11615 [00:07<00:25, 354.69 examples/s]
Map:  23%|██▎       | 2668/11615 [00:07<00:25, 357.86 examples/s]
Map:  23%|██▎       | 2700/11615 [00:07<00:24, 362.89 examples/s]
Map:  23%|██▎       | 2675/11615 [00:07<00:25, 355.42 examples/s]
Map:  23%|██▎       | 2675/11615 [00:07<00:25, 354.46 examples/s]
Map:  24%|██▎       | 2740/11615 [00:07<00:24, 360.87 examples/s]
Map:  24%|██▎       | 2740/11615 [00:07<00:24, 362.34 examples/s]
Map:  23%|██▎       | 2722/11615 [00:07<00:24, 363.17 examples/s]
Map:  23%|██▎       | 2706/11615 [00:07<00:24, 359.39 examples/s]
Map:  23%|██▎       | 2713/11615 [00:07<00:24, 359.55 examples/s]
Map:  24%|██▎       | 2740/11615 [00:07<00:24, 362.63 examples/s]
Map:  23%|██▎       | 2713/11615 [00:07<00:24, 358.32 examples/s]
Map:  24%|██▍       | 2779/11615 [00:07<00:24, 362.83 examples/s]
Map:  24%|██▍       | 2779/11615 [00:07<00:24, 364.71 examples/s]
Map:  24%|██▍       | 2760/11615 [00:07<00:24, 364.33 examples/s]
Map:  24%|██▎       | 2745/11615 [00:07<00:24, 361.69 examples/s]
Map:  24%|██▎       | 2751/11615 [00:07<00:24, 363.58 examples/s]
Map:  24%|██▎       | 2751/11615 [00:07<00:24, 362.20 examples/s]
Map:  24%|██▍       | 2779/11615 [00:07<00:24, 365.01 examples/s]
Map:  24%|██▍       | 2797/11615 [00:07<00:24, 362.74 examples/s]
Map:  24%|██▍       | 2834/11615 [00:07<00:24, 360.61 examples/s]
Map:  24%|██▍       | 2834/11615 [00:07<00:24, 362.64 examples/s]
Map:  24%|██▍       | 2782/11615 [00:07<00:24, 360.01 examples/s]
Map:  24%|██▍       | 2805/11615 [00:07<00:24, 360.06 examples/s]
Map:  24%|██▍       | 2805/11615 [00:07<00:24, 358.15 examples/s]
Map:  24%|██▍       | 2834/11615 [00:07<00:24, 362.30 examples/s]
Map:  24%|██▍       | 2834/11615 [00:07<00:24, 362.62 examples/s]
Map:  24%|██▍       | 2820/11615 [00:07<00:24, 357.57 examples/s]
Map:  25%|██▍       | 2887/11615 [00:08<00:24, 356.27 examples/s]
Map:  25%|██▍       | 2887/11615 [00:08<00:24, 357.83 examples/s]
Map:  25%|██▍       | 2860/11615 [00:08<00:24, 356.35 examples/s]
Map:  25%|██▍       | 2857/11615 [00:08<00:24, 358.54 examples/s]
Map:  25%|██▍       | 2887/11615 [00:08<00:24, 357.18 examples/s]
Map:  25%|██▍       | 2860/11615 [00:08<00:24, 354.54 examples/s]
Map:  25%|██▍       | 2887/11615 [00:08<00:24, 356.86 examples/s]
Map:  25%|██▌       | 2927/11615 [00:08<00:23, 365.49 examples/s]
Map:  25%|██▌       | 2927/11615 [00:08<00:23, 366.40 examples/s]
Map:  25%|██▌       | 2927/11615 [00:08<00:23, 365.94 examples/s]
Map:  25%|██▌       | 2927/11615 [00:08<00:23, 365.19 examples/s]
Map:  26%|██▌       | 2965/11615 [00:08<00:23, 367.48 examples/s]
Map:  26%|██▌       | 2965/11615 [00:08<00:23, 368.69 examples/s]
Map:  25%|██▌       | 2917/11615 [00:08<00:24, 360.19 examples/s]
Map:  25%|██▌       | 2916/11615 [00:08<00:24, 358.57 examples/s]
Map:  25%|██▌       | 2913/11615 [00:08<00:24, 357.64 examples/s]
Map:  26%|██▌       | 2965/11615 [00:08<00:23, 368.20 examples/s]
Map:  26%|██▌       | 2965/11615 [00:08<00:23, 367.42 examples/s]
Map:  25%|██▌       | 2957/11615 [00:08<00:23, 366.69 examples/s]
Map:  25%|██▌       | 2955/11615 [00:08<00:23, 363.89 examples/s]
Map:  25%|██▌       | 2953/11615 [00:08<00:23, 365.43 examples/s]
Map:  26%|██▌       | 3020/11615 [00:08<00:23, 364.67 examples/s]
Map:  26%|██▌       | 3020/11615 [00:08<00:23, 366.14 examples/s]
Map:  26%|██▌       | 3021/11615 [00:08<00:23, 366.80 examples/s]
Map:  26%|██▌       | 3020/11615 [00:08<00:23, 364.57 examples/s]
Map:  26%|██▌       | 2990/11615 [00:08<00:23, 360.36 examples/s]
Map:  26%|██▌       | 3011/11615 [00:08<00:23, 360.02 examples/s]
Map:  26%|██▌       | 3010/11615 [00:08<00:23, 359.54 examples/s]
Map:  26%|██▋       | 3071/11615 [00:08<00:24, 353.65 examples/s]
Map:  26%|██▋       | 3071/11615 [00:08<00:24, 354.04 examples/s]
Map:  26%|██▋       | 3059/11615 [00:08<00:23, 361.84 examples/s]
Map:  26%|██▌       | 3030/11615 [00:08<00:23, 361.41 examples/s]
Map:  26%|██▋       | 3049/11615 [00:08<00:23, 361.63 examples/s]
Map:  26%|██▋       | 3071/11615 [00:08<00:24, 352.12 examples/s]
Map:  26%|██▋       | 3049/11615 [00:08<00:23, 360.94 examples/s]
Map:  27%|██▋       | 3110/11615 [00:08<00:24, 353.85 examples/s]
Map:  27%|██▋       | 3110/11615 [00:08<00:24, 354.13 examples/s]
Map:  27%|██▋       | 3110/11615 [00:08<00:24, 351.49 examples/s]
Map:  27%|██▋       | 3110/11615 [00:08<00:24, 352.25 examples/s]
Map:  27%|██▋       | 3081/11615 [00:08<00:24, 348.58 examples/s]
Map:  27%|██▋       | 3150/11615 [00:08<00:23, 356.80 examples/s]
Map:  27%|██▋       | 3149/11615 [00:08<00:23, 360.99 examples/s]
Map:  27%|██▋       | 3100/11615 [00:08<00:24, 346.72 examples/s]
Map:  27%|██▋       | 3100/11615 [00:08<00:24, 346.47 examples/s]
Map:  27%|██▋       | 3149/11615 [00:08<00:23, 359.14 examples/s]
Map:  27%|██▋       | 3120/11615 [00:08<00:24, 352.10 examples/s]
Map:  27%|██▋       | 3150/11615 [00:08<00:23, 355.80 examples/s]
Map:  27%|██▋       | 3139/11615 [00:08<00:23, 354.68 examples/s]
Map:  27%|██▋       | 3139/11615 [00:08<00:23, 354.42 examples/s]
Map:  28%|██▊       | 3204/11615 [00:08<00:23, 354.41 examples/s]
Map:  28%|██▊       | 3200/11615 [00:08<00:24, 349.72 examples/s]
Map:  27%|██▋       | 3157/11615 [00:08<00:23, 352.83 examples/s]
Map:  28%|██▊       | 3201/11615 [00:08<00:23, 350.63 examples/s]
Map:  28%|██▊       | 3204/11615 [00:08<00:23, 353.17 examples/s]
Map:  28%|██▊       | 3241/11615 [00:08<00:23, 356.09 examples/s]
Map:  28%|██▊       | 3240/11615 [00:08<00:23, 356.07 examples/s]
Map:  27%|██▋       | 3190/11615 [00:09<00:24, 345.26 examples/s]
Map:  27%|██▋       | 3190/11615 [00:09<00:24, 344.87 examples/s]
Map:  28%|██▊       | 3240/11615 [00:09<00:23, 354.94 examples/s]
Map:  28%|██▊       | 3241/11615 [00:09<00:23, 354.94 examples/s]
Map:  28%|██▊       | 3210/11615 [00:09<00:24, 347.43 examples/s]
Map:  28%|██▊       | 3280/11615 [00:09<00:23, 360.61 examples/s]
Map:  28%|██▊       | 3278/11615 [00:09<00:23, 361.73 examples/s]
Map:  28%|██▊       | 3231/11615 [00:09<00:23, 356.28 examples/s]
Map:  28%|██▊       | 3231/11615 [00:09<00:23, 355.70 examples/s]
Map:  28%|██▊       | 3279/11615 [00:09<00:23, 362.26 examples/s]
Map:  28%|██▊       | 3280/11615 [00:09<00:23, 359.49 examples/s]
Map:  28%|██▊       | 3250/11615 [00:09<00:23, 355.44 examples/s]
Map:  29%|██▊       | 3315/11615 [00:09<00:22, 362.04 examples/s]
Map:  29%|██▊       | 3320/11615 [00:09<00:22, 365.19 examples/s]
Map:  28%|██▊       | 3270/11615 [00:09<00:23, 355.64 examples/s]
Map:  28%|██▊       | 3270/11615 [00:09<00:23, 354.78 examples/s]
Map:  29%|██▊       | 3317/11615 [00:09<00:22, 365.01 examples/s]
Map:  28%|██▊       | 3287/11615 [00:09<00:23, 357.80 examples/s]
Map:  29%|██▊       | 3320/11615 [00:09<00:22, 364.55 examples/s]
Map:  28%|██▊       | 3309/11615 [00:09<00:23, 358.87 examples/s]
Map:  28%|██▊       | 3309/11615 [00:09<00:23, 359.86 examples/s]
Map:  29%|██▉       | 3375/11615 [00:09<00:22, 362.59 examples/s]
Map:  29%|██▉       | 3372/11615 [00:09<00:23, 358.04 examples/s]
Map:  29%|██▊       | 3324/11615 [00:09<00:23, 359.48 examples/s]
Map:  29%|██▉       | 3372/11615 [00:09<00:23, 358.38 examples/s]
Map:  29%|██▉       | 3374/11615 [00:09<00:22, 361.03 examples/s]
Map:  29%|██▉       | 3410/11615 [00:09<00:22, 359.75 examples/s]
Map:  29%|██▉       | 3363/11615 [00:09<00:23, 351.71 examples/s]
Map:  29%|██▉       | 3363/11615 [00:09<00:23, 353.91 examples/s]
Map:  30%|██▉       | 3430/11615 [00:09<00:22, 361.40 examples/s]
Map:  29%|██▉       | 3410/11615 [00:09<00:22, 360.15 examples/s]
Map:  29%|██▉       | 3377/11615 [00:09<00:23, 355.54 examples/s]
Map:  29%|██▉       | 3411/11615 [00:09<00:22, 361.23 examples/s]
Map:  30%|██▉       | 3450/11615 [00:09<00:22, 363.72 examples/s]
Map:  29%|██▉       | 3400/11615 [00:09<00:23, 355.37 examples/s]
Map:  29%|██▉       | 3400/11615 [00:09<00:22, 357.58 examples/s]
Map:  30%|██▉       | 3470/11615 [00:09<00:22, 364.95 examples/s]
Map:  30%|██▉       | 3450/11615 [00:09<00:22, 364.19 examples/s]
Map:  29%|██▉       | 3416/11615 [00:09<00:22, 358.61 examples/s]
Map:  30%|██▉       | 3450/11615 [00:09<00:22, 363.93 examples/s]
Map:  30%|███       | 3487/11615 [00:09<00:22, 361.05 examples/s]
Map:  30%|██▉       | 3439/11615 [00:09<00:22, 360.22 examples/s]
Map:  30%|██▉       | 3439/11615 [00:09<00:22, 362.42 examples/s]
Map:  30%|███       | 3507/11615 [00:09<00:22, 363.22 examples/s]
Map:  30%|███       | 3487/11615 [00:09<00:22, 361.22 examples/s]
Map:  30%|███       | 3487/11615 [00:09<00:22, 360.80 examples/s]
Map:  30%|███       | 3526/11615 [00:09<00:22, 365.05 examples/s]
Map:  30%|██▉       | 3472/11615 [00:09<00:22, 359.71 examples/s]
Map:  30%|██▉       | 3477/11615 [00:09<00:22, 358.23 examples/s]
Map:  30%|██▉       | 3477/11615 [00:09<00:22, 360.21 examples/s]
Map:  30%|███       | 3526/11615 [00:09<00:22, 364.92 examples/s]
Map:  30%|███       | 3526/11615 [00:09<00:22, 364.75 examples/s]
Map:  31%|███       | 3561/11615 [00:09<00:22, 359.22 examples/s]
Map:  31%|███       | 3580/11615 [00:09<00:22, 359.59 examples/s]
Map:  30%|███       | 3526/11615 [00:09<00:22, 358.04 examples/s]
Map:  30%|███       | 3531/11615 [00:09<00:22, 356.39 examples/s]
Map:  30%|███       | 3531/11615 [00:09<00:22, 358.34 examples/s]
Map:  31%|███       | 3600/11615 [00:09<00:22, 358.18 examples/s]
Map:  31%|███       | 3580/11615 [00:09<00:22, 358.86 examples/s]
Map:  31%|███       | 3580/11615 [00:10<00:22, 359.37 examples/s]
Map:  31%|███       | 3621/11615 [00:10<00:21, 366.85 examples/s]
Map:  31%|███       | 3562/11615 [00:10<00:22, 355.82 examples/s]
Map:  31%|███       | 3570/11615 [00:10<00:22, 355.45 examples/s]
Map:  31%|███       | 3570/11615 [00:10<00:22, 357.73 examples/s]
Map:  31%|███▏      | 3640/11615 [00:10<00:21, 366.32 examples/s]
Map:  31%|███       | 3621/11615 [00:10<00:21, 366.41 examples/s]
Map:  31%|███       | 3621/11615 [00:10<00:21, 366.82 examples/s]
Map:  32%|███▏      | 3660/11615 [00:10<00:21, 369.00 examples/s]
Map:  31%|███       | 3600/11615 [00:10<00:22, 354.07 examples/s]
Map:  31%|███       | 3607/11615 [00:10<00:22, 356.65 examples/s]
Map:  31%|███       | 3607/11615 [00:10<00:22, 359.14 examples/s]
Map:  32%|███▏      | 3660/11615 [00:10<00:21, 369.12 examples/s]
Map:  32%|███▏      | 3660/11615 [00:10<00:21, 369.52 examples/s]
Map:  32%|███▏      | 3690/11615 [00:10<00:22, 351.40 examples/s]
Map:  31%|███▏      | 3640/11615 [00:10<00:22, 362.20 examples/s]
Map:  31%|███▏      | 3647/11615 [00:10<00:21, 366.95 examples/s]
Map:  31%|███▏      | 3647/11615 [00:10<00:21, 364.82 examples/s]
Map:  32%|███▏      | 3710/11615 [00:10<00:22, 353.43 examples/s]
Map:  32%|███▏      | 3730/11615 [00:10<00:21, 360.72 examples/s]
Map:  32%|███▏      | 3710/11615 [00:10<00:22, 354.47 examples/s]
Map:  32%|███▏      | 3710/11615 [00:10<00:22, 354.36 examples/s]
Map:  32%|███▏      | 3750/11615 [00:10<00:21, 359.38 examples/s]
Map:  32%|███▏      | 3690/11615 [00:10<00:22, 347.49 examples/s]
Map:  32%|███▏      | 3695/11615 [00:10<00:22, 348.32 examples/s]
Map:  32%|███▏      | 3695/11615 [00:10<00:22, 346.24 examples/s]
Map:  32%|███▏      | 3750/11615 [00:10<00:21, 360.40 examples/s]
Map:  32%|███▏      | 3750/11615 [00:10<00:21, 360.27 examples/s]
Map:  33%|███▎      | 3784/11615 [00:10<00:22, 355.68 examples/s]
Map:  32%|███▏      | 3735/11615 [00:10<00:21, 359.94 examples/s]
Map:  32%|███▏      | 3735/11615 [00:10<00:22, 357.82 examples/s]
Map:  33%|███▎      | 3790/11615 [00:10<00:25, 311.72 examples/s]
Map:  32%|███▏      | 3730/11615 [00:10<00:24, 317.17 examples/s]
Map:  33%|███▎      | 3835/11615 [00:10<00:23, 329.33 examples/s]
Map:  33%|███▎      | 3801/11615 [00:10<00:25, 304.94 examples/s]
Map:  33%|███▎      | 3804/11615 [00:10<00:24, 318.46 examples/s]
Map:  33%|███▎      | 3785/11615 [00:10<00:25, 310.44 examples/s]
Map:  33%|███▎      | 3776/11615 [00:10<00:25, 308.54 examples/s]
Map:  33%|███▎      | 3840/11615 [00:10<00:24, 312.51 examples/s]
Map:  33%|███▎      | 3781/11615 [00:10<00:26, 298.33 examples/s]
Map:  33%|███▎      | 3871/11615 [00:10<00:23, 333.92 examples/s]
Map:  33%|███▎      | 3843/11615 [00:10<00:23, 333.31 examples/s]
Map:  33%|███▎      | 3840/11615 [00:10<00:24, 321.96 examples/s]
Map:  33%|███▎      | 3820/11615 [00:10<00:24, 318.20 examples/s]
Map:  33%|███▎      | 3817/11615 [00:10<00:25, 310.55 examples/s]
Map:  33%|███▎      | 3812/11615 [00:10<00:24, 317.56 examples/s]
Map:  33%|███▎      | 3878/11615 [00:10<00:23, 325.28 examples/s]
Map:  33%|███▎      | 3879/11615 [00:10<00:22, 337.65 examples/s]
Map:  33%|███▎      | 3877/11615 [00:10<00:23, 329.99 examples/s]
Map:  34%|███▎      | 3910/11615 [00:10<00:22, 338.83 examples/s]
Map:  33%|███▎      | 3860/11615 [00:10<00:23, 332.96 examples/s]
Map:  33%|███▎      | 3855/11615 [00:10<00:23, 326.45 examples/s]
Map:  33%|███▎      | 3850/11615 [00:10<00:23, 331.01 examples/s]
Map:  34%|███▎      | 3913/11615 [00:10<00:23, 329.92 examples/s]
Map:  34%|███▎      | 3916/11615 [00:10<00:22, 340.30 examples/s]
Map:  34%|███▍      | 3947/11615 [00:10<00:22, 344.77 examples/s]
Map:  34%|███▎      | 3897/11615 [00:11<00:22, 341.19 examples/s]
Map:  34%|███▍      | 3930/11615 [00:11<00:22, 334.31 examples/s]
Map:  33%|███▎      | 3890/11615 [00:11<00:23, 328.06 examples/s]
Map:  34%|███▍      | 3950/11615 [00:11<00:22, 336.18 examples/s]
Map:  33%|███▎      | 3889/11615 [00:11<00:22, 339.49 examples/s]
Map:  34%|███▍      | 3952/11615 [00:11<00:22, 343.06 examples/s]
Map:  34%|███▍      | 3984/11615 [00:11<00:21, 349.49 examples/s]
Map:  34%|███▍      | 3971/11615 [00:11<00:22, 346.04 examples/s]
Map:  34%|███▍      | 3924/11615 [00:11<00:22, 339.72 examples/s]
Map:  34%|███▍      | 3929/11615 [00:11<00:22, 338.04 examples/s]
Map:  34%|███▍      | 3990/11615 [00:11<00:21, 346.64 examples/s]
Map:  34%|███▍      | 3950/11615 [00:11<00:22, 340.35 examples/s]
Map:  34%|███▍      | 3987/11615 [00:11<00:22, 342.04 examples/s]
Map:  35%|███▍      | 4020/11615 [00:11<00:21, 346.99 examples/s]
Map:  34%|███▍      | 3961/11615 [00:11<00:22, 343.44 examples/s]
Map:  34%|███▍      | 3968/11615 [00:11<00:21, 347.72 examples/s]
Map:  35%|███▍      | 4027/11615 [00:11<00:21, 349.08 examples/s]
Map:  35%|███▍      | 4009/11615 [00:11<00:21, 347.44 examples/s]
Map:  34%|███▍      | 3989/11615 [00:11<00:21, 350.86 examples/s]
Map:  35%|███▍      | 4022/11615 [00:11<00:22, 341.88 examples/s]
Map:  35%|███▍      | 4060/11615 [00:11<00:21, 353.97 examples/s]
Map:  34%|███▍      | 3999/11615 [00:11<00:21, 352.31 examples/s]
Map:  35%|███▍      | 4065/11615 [00:11<00:21, 355.74 examples/s]
Map:  35%|███▍      | 4046/11615 [00:11<00:21, 350.95 examples/s]
Map:  35%|███▍      | 4060/11615 [00:11<00:21, 349.31 examples/s]
Map:  35%|███▍      | 4020/11615 [00:11<00:22, 344.13 examples/s]
Map:  35%|███▌      | 4098/11615 [00:11<00:20, 358.08 examples/s]
Map:  35%|███▍      | 4043/11615 [00:11<00:21, 351.50 examples/s]
Map:  35%|███▍      | 4035/11615 [00:11<00:21, 349.67 examples/s]
Map:  35%|███▌      | 4098/11615 [00:11<00:21, 355.32 examples/s]
Map:  35%|███▌      | 4100/11615 [00:11<00:21, 352.40 examples/s]
Map:  36%|███▌      | 4135/11615 [00:11<00:20, 358.42 examples/s]
Map:  35%|███▌      | 4120/11615 [00:11<00:21, 353.38 examples/s]
Map:  35%|███▍      | 4060/11615 [00:11<00:21, 350.20 examples/s]
Map:  35%|███▌      | 4080/11615 [00:11<00:21, 349.25 examples/s]
Map:  35%|███▌      | 4073/11615 [00:11<00:21, 354.21 examples/s]
Map:  36%|███▌      | 4135/11615 [00:11<00:20, 356.49 examples/s]
Map:  36%|███▌      | 4173/11615 [00:11<00:20, 360.86 examples/s]
Map:  35%|███▌      | 4097/11615 [00:11<00:21, 354.21 examples/s]
Map:  36%|███▌      | 4140/11615 [00:11<00:21, 353.33 examples/s]
Map:  36%|███▌      | 4161/11615 [00:11<00:20, 357.79 examples/s]
Map:  35%|███▌      | 4120/11615 [00:11<00:21, 355.00 examples/s]
Map:  35%|███▌      | 4110/11615 [00:11<00:21, 353.21 examples/s]
Map:  36%|███▌      | 4173/11615 [00:11<00:20, 360.21 examples/s]
Map:  36%|███▌      | 4134/11615 [00:11<00:21, 354.85 examples/s]
Map:  36%|███▌      | 4180/11615 [00:11<00:20, 360.57 examples/s]
Map:  36%|███▌      | 4197/11615 [00:11<00:20, 354.92 examples/s]
Map:  36%|███▌      | 4158/11615 [00:11<00:20, 360.30 examples/s]
Map:  36%|███▋      | 4225/11615 [00:11<00:21, 350.42 examples/s]
Map:  36%|███▌      | 4148/11615 [00:11<00:20, 357.49 examples/s]
Map:  36%|███▌      | 4171/11615 [00:11<00:20, 355.84 examples/s]
Map:  36%|███▋      | 4233/11615 [00:11<00:20, 352.06 examples/s]
Map:  36%|███▋      | 4225/11615 [00:11<00:21, 350.46 examples/s]
Map:  37%|███▋      | 4262/11615 [00:11<00:20, 354.49 examples/s]
Map:  36%|███▌      | 4186/11615 [00:11<00:20, 357.08 examples/s]
Map:  36%|███▋      | 4233/11615 [00:11<00:20, 352.12 examples/s]
Map:  36%|███▌      | 4210/11615 [00:11<00:21, 350.29 examples/s]
Map:  37%|███▋      | 4271/11615 [00:11<00:20, 353.54 examples/s]
Map:  37%|███▋      | 4263/11615 [00:11<00:20, 355.87 examples/s]
Map:  37%|███▋      | 4300/11615 [00:11<00:20, 354.18 examples/s]
Map:  36%|███▋      | 4222/11615 [00:11<00:21, 344.90 examples/s]
Map:  37%|███▋      | 4270/11615 [00:11<00:20, 353.00 examples/s]
Map:  37%|███▋      | 4249/11615 [00:12<00:20, 357.94 examples/s]
Map:  37%|███▋      | 4240/11615 [00:12<00:20, 351.54 examples/s]
Map:  37%|███▋      | 4310/11615 [00:12<00:20, 353.13 examples/s]
Map:  37%|███▋      | 4300/11615 [00:12<00:20, 353.76 examples/s]
Map:  37%|███▋      | 4338/11615 [00:12<00:20, 357.10 examples/s]
Map:  37%|███▋      | 4260/11615 [00:12<00:21, 350.00 examples/s]
Map:  37%|███▋      | 4308/11615 [00:12<00:20, 356.68 examples/s]
Map:  37%|███▋      | 4276/11615 [00:12<00:20, 352.29 examples/s]
Map:  37%|███▋      | 4350/11615 [00:12<00:21, 342.91 examples/s]
Map:  37%|███▋      | 4338/11615 [00:12<00:20, 357.12 examples/s]
Map:  37%|███▋      | 4302/11615 [00:12<00:20, 353.31 examples/s]
Map:  37%|███▋      | 4344/11615 [00:12<00:20, 354.27 examples/s]
Map:  37%|███▋      | 4300/11615 [00:12<00:20, 353.28 examples/s]
Map:  38%|███▊      | 4389/11615 [00:12<00:20, 347.73 examples/s]
Map:  37%|███▋      | 4313/11615 [00:12<00:20, 353.68 examples/s]
Map:  38%|███▊      | 4389/11615 [00:12<00:20, 352.38 examples/s]
Map:  37%|███▋      | 4338/11615 [00:12<00:20, 353.93 examples/s]
Map:  37%|███▋      | 4338/11615 [00:12<00:20, 356.09 examples/s]
Map:  38%|███▊      | 4380/11615 [00:12<00:20, 348.10 examples/s]
Map:  38%|███▊      | 4389/11615 [00:12<00:20, 348.26 examples/s]
Map:  37%|███▋      | 4349/11615 [00:12<00:20, 353.54 examples/s]
Map:  38%|███▊      | 4441/11615 [00:12<00:20, 342.88 examples/s]
Map:  38%|███▊      | 4389/11615 [00:12<00:20, 346.12 examples/s]
Map:  38%|███▊      | 4442/11615 [00:12<00:20, 346.80 examples/s]
Map:  38%|███▊      | 4389/11615 [00:12<00:20, 346.71 examples/s]
Map:  38%|███▊      | 4431/11615 [00:12<00:20, 342.27 examples/s]
Map:  38%|███▊      | 4442/11615 [00:12<00:20, 343.43 examples/s]
Map:  38%|███▊      | 4400/11615 [00:12<00:21, 340.81 examples/s]
Map:  39%|███▊      | 4494/11615 [00:12<00:20, 340.47 examples/s]
Map:  38%|███▊      | 4468/11615 [00:12<00:20, 348.72 examples/s]
Map:  38%|███▊      | 4441/11615 [00:12<00:20, 342.14 examples/s]
Map:  39%|███▊      | 4495/11615 [00:12<00:20, 344.65 examples/s]
Map:  38%|███▊      | 4435/11615 [00:12<00:21, 341.00 examples/s]
Map:  38%|███▊      | 4442/11615 [00:12<00:20, 342.51 examples/s]
Map:  39%|███▊      | 4494/11615 [00:12<00:20, 340.65 examples/s]
Map:  39%|███▉      | 4532/11615 [00:12<00:20, 348.90 examples/s]
Map:  39%|███▉      | 4533/11615 [00:12<00:20, 352.32 examples/s]
Map:  38%|███▊      | 4470/11615 [00:12<00:20, 341.12 examples/s]
Map:  39%|███▉      | 4522/11615 [00:12<00:20, 347.38 examples/s]
Map:  39%|███▉      | 4532/11615 [00:12<00:20, 349.03 examples/s]
Map:  39%|███▊      | 4494/11615 [00:12<00:20, 339.36 examples/s]
Map:  39%|███▊      | 4494/11615 [00:12<00:20, 339.77 examples/s]
Map:  39%|███▉      | 4581/11615 [00:12<00:20, 339.62 examples/s]
Map:  39%|███▉      | 4505/11615 [00:12<00:20, 340.16 examples/s]
Map:  39%|███▉      | 4582/11615 [00:12<00:20, 340.93 examples/s]
Map:  39%|███▉      | 4532/11615 [00:12<00:20, 347.50 examples/s]
Map:  39%|███▉      | 4532/11615 [00:12<00:20, 347.81 examples/s]
Map:  39%|███▉      | 4573/11615 [00:12<00:20, 339.86 examples/s]
Map:  39%|███▉      | 4581/11615 [00:12<00:20, 339.43 examples/s]
Map:  40%|███▉      | 4619/11615 [00:12<00:20, 344.94 examples/s]
Map:  39%|███▉      | 4542/11615 [00:12<00:20, 344.92 examples/s]
Map:  40%|███▉      | 4619/11615 [00:12<00:20, 345.51 examples/s]
Map:  40%|███▉      | 4610/11615 [00:12<00:20, 342.34 examples/s]
Map:  39%|███▉      | 4581/11615 [00:13<00:20, 338.29 examples/s]
Map:  40%|███▉      | 4619/11615 [00:13<00:20, 344.12 examples/s]
Map:  40%|████      | 4655/11615 [00:13<00:20, 346.71 examples/s]
Map:  39%|███▉      | 4581/11615 [00:13<00:20, 337.92 examples/s]
Map:  40%|████      | 4655/11615 [00:13<00:20, 347.30 examples/s]
Map:  40%|███▉      | 4592/11615 [00:13<00:20, 339.13 examples/s]
Map:  40%|████      | 4647/11615 [00:13<00:20, 344.98 examples/s]
Map:  40%|████      | 4654/11615 [00:13<00:20, 344.66 examples/s]
Map:  40%|███▉      | 4618/11615 [00:13<00:20, 342.31 examples/s]
Map:  40%|████      | 4690/11615 [00:13<00:20, 343.88 examples/s]
Map:  40%|███▉      | 4618/11615 [00:13<00:20, 342.12 examples/s]
Map:  40%|████      | 4690/11615 [00:13<00:20, 344.51 examples/s]
Map:  40%|███▉      | 4630/11615 [00:13<00:20, 343.20 examples/s]
Map:  40%|████      | 4684/11615 [00:13<00:19, 349.40 examples/s]
Map:  40%|████      | 4654/11615 [00:13<00:20, 343.89 examples/s]
Map:  40%|████      | 4690/11615 [00:13<00:20, 343.32 examples/s]
Map:  41%|████      | 4731/11615 [00:13<00:19, 361.23 examples/s]
Map:  40%|████      | 4653/11615 [00:13<00:20, 342.84 examples/s]
Map:  41%|████      | 4731/11615 [00:13<00:19, 362.03 examples/s]
Map:  40%|████      | 4667/11615 [00:13<00:19, 348.56 examples/s]
Map:  41%|████      | 4724/11615 [00:13<00:19, 360.99 examples/s]
Map:  41%|████      | 4731/11615 [00:13<00:19, 360.74 examples/s]
Map:  40%|████      | 4690/11615 [00:13<00:20, 342.35 examples/s]
Map:  41%|████      | 4768/11615 [00:13<00:19, 358.37 examples/s]
Map:  40%|████      | 4690/11615 [00:13<00:20, 342.40 examples/s]
Map:  41%|████      | 4768/11615 [00:13<00:19, 358.50 examples/s]
Map:  41%|████      | 4768/11615 [00:13<00:19, 357.88 examples/s]
Map:  41%|████      | 4731/11615 [00:13<00:19, 359.15 examples/s]
Map:  41%|████      | 4722/11615 [00:13<00:19, 352.65 examples/s]
Map:  41%|████      | 4777/11615 [00:13<00:19, 354.83 examples/s]
Map:  41%|████      | 4731/11615 [00:13<00:19, 358.80 examples/s]
Map:  41%|████▏     | 4820/11615 [00:13<00:19, 348.02 examples/s]
Map:  41%|████▏     | 4820/11615 [00:13<00:19, 348.22 examples/s]
Map:  41%|████      | 4768/11615 [00:13<00:19, 355.83 examples/s]
Map:  41%|████      | 4760/11615 [00:13<00:19, 355.00 examples/s]
Map:  41%|████      | 4768/11615 [00:13<00:19, 354.87 examples/s]
Map:  41%|████▏     | 4819/11615 [00:13<00:19, 347.33 examples/s]
Map:  42%|████▏     | 4856/11615 [00:13<00:19, 347.77 examples/s]
Map:  42%|████▏     | 4829/11615 [00:13<00:19, 348.16 examples/s]
Map:  42%|████▏     | 4856/11615 [00:13<00:19, 347.66 examples/s]
Map:  41%|████▏     | 4820/11615 [00:13<00:19, 345.33 examples/s]
Map:  41%|████▏     | 4809/11615 [00:13<00:19, 342.30 examples/s]
Map:  42%|████▏     | 4865/11615 [00:13<00:19, 349.23 examples/s]
Map:  41%|████▏     | 4820/11615 [00:13<00:19, 344.49 examples/s]
Map:  42%|████▏     | 4870/11615 [00:13<00:19, 341.21 examples/s]
Map:  42%|████▏     | 4908/11615 [00:13<00:19, 341.77 examples/s]
Map:  42%|████▏     | 4907/11615 [00:13<00:19, 341.72 examples/s]
Map:  42%|████▏     | 4856/11615 [00:13<00:19, 344.97 examples/s]
Map:  42%|████▏     | 4855/11615 [00:13<00:19, 344.54 examples/s]
Map:  42%|████▏     | 4905/11615 [00:13<00:19, 340.25 examples/s]
Map:  42%|████▏     | 4861/11615 [00:13<00:19, 341.54 examples/s]
Map:  42%|████▏     | 4918/11615 [00:13<00:19, 341.07 examples/s]
Map:  43%|████▎     | 4960/11615 [00:13<00:19, 340.88 examples/s]
Map:  43%|████▎     | 4960/11615 [00:13<00:19, 340.17 examples/s]
Map:  43%|████▎     | 4940/11615 [00:13<00:19, 335.79 examples/s]
Map:  42%|████▏     | 4907/11615 [00:13<00:19, 339.33 examples/s]
Map:  42%|████▏     | 4905/11615 [00:13<00:19, 338.18 examples/s]
Map:  43%|████▎     | 4953/11615 [00:13<00:19, 341.10 examples/s]
Map:  43%|████▎     | 4998/11615 [00:14<00:19, 347.70 examples/s]
Map:  42%|████▏     | 4910/11615 [00:13<00:20, 331.82 examples/s]
Map:  43%|████▎     | 4998/11615 [00:14<00:19, 345.62 examples/s]
Map:  43%|████▎     | 4977/11615 [00:14<00:19, 343.19 examples/s]
Map:  43%|████▎     | 4990/11615 [00:14<00:19, 343.76 examples/s]
Map:  43%|████▎     | 4940/11615 [00:14<00:19, 333.90 examples/s]
Map:  43%|████▎     | 4959/11615 [00:14<00:19, 339.51 examples/s]
Map:  43%|████▎     | 4948/11615 [00:14<00:19, 338.61 examples/s]
Map:  43%|████▎     | 5012/11615 [00:14<00:19, 343.05 examples/s]
Map:  43%|████▎     | 5050/11615 [00:14<00:18, 346.11 examples/s]
Map:  43%|████▎     | 5026/11615 [00:14<00:19, 346.40 examples/s]
Map:  43%|████▎     | 4976/11615 [00:14<00:19, 339.50 examples/s]
Map:  43%|████▎     | 5050/11615 [00:14<00:19, 344.12 examples/s]
Map:  43%|████▎     | 4994/11615 [00:14<00:19, 340.93 examples/s]
Map:  43%|████▎     | 5050/11615 [00:14<00:19, 344.84 examples/s]
Map:  44%|████▍     | 5090/11615 [00:14<00:18, 353.76 examples/s]
Map:  43%|████▎     | 5000/11615 [00:14<00:19, 339.36 examples/s]
Map:  44%|████▎     | 5067/11615 [00:14<00:18, 363.46 examples/s]
Map:  43%|████▎     | 5012/11615 [00:14<00:19, 341.29 examples/s]
Map:  44%|████▍     | 5090/11615 [00:14<00:18, 351.97 examples/s]
Map:  43%|████▎     | 5030/11615 [00:14<00:19, 341.15 examples/s]
Map:  43%|████▎     | 5035/11615 [00:14<00:19, 340.54 examples/s]
Map:  44%|████▍     | 5090/11615 [00:14<00:18, 353.67 examples/s]
Map:  44%|████▍     | 5129/11615 [00:14<00:18, 360.29 examples/s]
Map:  43%|████▎     | 5049/11615 [00:14<00:19, 345.00 examples/s]
Map:  44%|████▍     | 5129/11615 [00:14<00:18, 358.46 examples/s]
Map:  44%|████▎     | 5070/11615 [00:14<00:18, 355.74 examples/s]
Map:  44%|████▍     | 5120/11615 [00:14<00:18, 355.21 examples/s]
Map:  44%|████▎     | 5075/11615 [00:14<00:18, 354.05 examples/s]
Map:  44%|████▍     | 5129/11615 [00:14<00:18, 360.24 examples/s]
Map:  44%|████▍     | 5086/11615 [00:14<00:18, 350.55 examples/s]
Map:  45%|████▍     | 5183/11615 [00:14<00:17, 357.50 examples/s]
Map:  44%|████▍     | 5159/11615 [00:14<00:17, 358.74 examples/s]
Map:  45%|████▍     | 5182/11615 [00:14<00:18, 355.54 examples/s]
Map:  44%|████▍     | 5124/11615 [00:14<00:18, 354.85 examples/s]
Map:  44%|████▍     | 5124/11615 [00:14<00:18, 355.78 examples/s]
Map:  45%|████▍     | 5183/11615 [00:14<00:18, 356.77 examples/s]
Map:  45%|████▍     | 5222/11615 [00:14<00:17, 361.99 examples/s]
Map:  44%|████▍     | 5130/11615 [00:14<00:18, 350.43 examples/s]
Map:  45%|████▍     | 5197/11615 [00:14<00:17, 363.64 examples/s]
Map:  45%|████▍     | 5220/11615 [00:14<00:17, 359.86 examples/s]
Map:  44%|████▍     | 5160/11615 [00:14<00:18, 350.77 examples/s]
Map:  44%|████▍     | 5160/11615 [00:14<00:18, 351.16 examples/s]
Map:  45%|████▌     | 5261/11615 [00:14<00:17, 366.82 examples/s]
Map:  45%|████▍     | 5222/11615 [00:14<00:17, 361.09 examples/s]
Map:  44%|████▍     | 5167/11615 [00:14<00:18, 353.25 examples/s]
Map:  45%|████▌     | 5260/11615 [00:14<00:17, 364.89 examples/s]
Map:  45%|████▍     | 5200/11615 [00:14<00:17, 358.34 examples/s]
Map:  45%|████▌     | 5253/11615 [00:14<00:17, 365.59 examples/s]
Map:  45%|████▍     | 5200/11615 [00:14<00:17, 356.87 examples/s]
Map:  45%|████▌     | 5260/11615 [00:14<00:17, 364.59 examples/s]
Map:  45%|████▍     | 5205/11615 [00:14<00:17, 357.00 examples/s]
Map:  46%|████▌     | 5300/11615 [00:14<00:17, 364.07 examples/s]
Map:  45%|████▌     | 5238/11615 [00:14<00:17, 363.29 examples/s]
Map:  46%|████▌     | 5300/11615 [00:14<00:17, 363.10 examples/s]
Map:  46%|████▌     | 5290/11615 [00:14<00:17, 360.90 examples/s]
Map:  45%|████▌     | 5238/11615 [00:14<00:17, 361.71 examples/s]
Map:  45%|████▌     | 5244/11615 [00:14<00:17, 362.46 examples/s]
Map:  46%|████▌     | 5300/11615 [00:14<00:17, 362.95 examples/s]
Map:  46%|████▌     | 5340/11615 [00:14<00:17, 360.42 examples/s]
Map:  46%|████▌     | 5340/11615 [00:14<00:17, 359.83 examples/s]
Map:  46%|████▌     | 5329/11615 [00:14<00:17, 364.53 examples/s]
Map:  46%|████▌     | 5293/11615 [00:15<00:17, 361.96 examples/s]
Map:  46%|████▋     | 5380/11615 [00:15<00:17, 361.48 examples/s]
Map:  46%|████▌     | 5292/11615 [00:15<00:17, 360.07 examples/s]
Map:  46%|████▌     | 5353/11615 [00:15<00:17, 357.17 examples/s]
Map:  46%|████▌     | 5300/11615 [00:15<00:17, 359.75 examples/s]
Map:  46%|████▋     | 5380/11615 [00:15<00:17, 360.60 examples/s]
Map:  46%|████▌     | 5366/11615 [00:15<00:17, 361.89 examples/s]
Map:  46%|████▌     | 5330/11615 [00:15<00:17, 358.24 examples/s]
Map:  46%|████▌     | 5329/11615 [00:15<00:17, 360.14 examples/s]
Map:  46%|████▋     | 5392/11615 [00:15<00:17, 363.90 examples/s]
Map:  47%|████▋     | 5438/11615 [00:15<00:17, 362.71 examples/s]
Map:  46%|████▌     | 5369/11615 [00:15<00:17, 362.41 examples/s]
Map:  46%|████▌     | 5352/11615 [00:15<00:17, 353.84 examples/s]
Map:  47%|████▋     | 5418/11615 [00:15<00:17, 355.17 examples/s]
Map:  47%|████▋     | 5437/11615 [00:15<00:17, 361.72 examples/s]
Map:  46%|████▌     | 5366/11615 [00:15<00:17, 358.05 examples/s]
Map:  47%|████▋     | 5480/11615 [00:15<00:16, 369.04 examples/s]
Map:  47%|████▋     | 5447/11615 [00:15<00:16, 363.34 examples/s]
Map:  46%|████▋     | 5392/11615 [00:15<00:17, 360.53 examples/s]
Map:  47%|████▋     | 5477/11615 [00:15<00:16, 369.22 examples/s]
Map:  47%|████▋     | 5460/11615 [00:15<00:16, 367.85 examples/s]
Map:  47%|████▋     | 5402/11615 [00:15<00:17, 357.24 examples/s]
Map:  47%|████▋     | 5420/11615 [00:15<00:17, 349.38 examples/s]
Map:  47%|████▋     | 5487/11615 [00:15<00:16, 370.86 examples/s]
Map:  47%|████▋     | 5499/11615 [00:15<00:16, 368.00 examples/s]
Map:  47%|████▋     | 5439/11615 [00:15<00:17, 358.81 examples/s]
Map:  48%|████▊     | 5533/11615 [00:15<00:16, 360.72 examples/s]
Map:  47%|████▋     | 5462/11615 [00:15<00:16, 364.44 examples/s]
Map:  47%|████▋     | 5447/11615 [00:15<00:17, 360.13 examples/s]
Map:  48%|████▊     | 5530/11615 [00:15<00:16, 358.21 examples/s]
Map:  47%|████▋     | 5480/11615 [00:15<00:16, 365.08 examples/s]
Map:  47%|████▋     | 5487/11615 [00:15<00:16, 367.65 examples/s]
Map:  47%|████▋     | 5500/11615 [00:15<00:16, 360.50 examples/s]
Map:  48%|████▊     | 5540/11615 [00:15<00:17, 355.32 examples/s]
Map:  48%|████▊     | 5550/11615 [00:15<00:17, 354.66 examples/s]
Map:  48%|████▊     | 5570/11615 [00:15<00:20, 301.31 examples/s]
Map:  48%|████▊     | 5571/11615 [00:15<00:20, 298.60 examples/s]
Map:  48%|████▊     | 5533/11615 [00:15<00:17, 356.49 examples/s]
Map:  48%|████▊     | 5588/11615 [00:15<00:19, 315.68 examples/s]
Map:  48%|████▊     | 5609/11615 [00:15<00:19, 313.78 examples/s]
Map:  48%|████▊     | 5619/11615 [00:15<00:19, 304.04 examples/s]
Map:  48%|████▊     | 5533/11615 [00:15<00:20, 300.62 examples/s]
Map:  48%|████▊     | 5549/11615 [00:15<00:20, 299.97 examples/s]
Map:  48%|████▊     | 5591/11615 [00:15<00:19, 305.08 examples/s]
Map:  48%|████▊     | 5570/11615 [00:15<00:17, 354.18 examples/s]
Map:  48%|████▊     | 5623/11615 [00:15<00:18, 323.19 examples/s]
Map:  49%|████▊     | 5657/11615 [00:15<00:18, 320.84 examples/s]
Map:  49%|████▊     | 5645/11615 [00:15<00:18, 322.84 examples/s]
Map:  48%|████▊     | 5585/11615 [00:15<00:19, 311.41 examples/s]
Map:  48%|████▊     | 5629/11615 [00:15<00:18, 319.63 examples/s]
Map:  48%|████▊     | 5570/11615 [00:15<00:19, 311.26 examples/s]
Map:  48%|████▊     | 5609/11615 [00:15<00:16, 356.72 examples/s]
Map:  49%|████▊     | 5661/11615 [00:15<00:17, 335.66 examples/s]
Map:  49%|████▉     | 5697/11615 [00:16<00:17, 338.48 examples/s]
Map:  49%|████▉     | 5686/11615 [00:16<00:17, 343.59 examples/s]
Map:  48%|████▊     | 5620/11615 [00:16<00:18, 317.74 examples/s]
Map:  49%|████▉     | 5670/11615 [00:16<00:17, 334.62 examples/s]
Map:  48%|████▊     | 5609/11615 [00:16<00:18, 323.78 examples/s]
Map:  49%|████▊     | 5645/11615 [00:16<00:16, 355.07 examples/s]
Map:  49%|████▉     | 5701/11615 [00:16<00:16, 351.94 examples/s]
Map:  49%|████▊     | 5659/11615 [00:16<00:17, 334.56 examples/s]
Map:  49%|████▊     | 5645/11615 [00:16<00:18, 330.02 examples/s]
Map:  49%|████▉     | 5710/11615 [00:16<00:17, 345.41 examples/s]
Map:  49%|████▉     | 5686/11615 [00:16<00:16, 368.78 examples/s]
Map:  49%|████▉     | 5740/11615 [00:16<00:17, 342.90 examples/s]
Map:  50%|████▉     | 5750/11615 [00:16<00:17, 336.50 examples/s]
Map:  50%|████▉     | 5753/11615 [00:16<00:16, 348.50 examples/s]
Map:  49%|████▉     | 5700/11615 [00:16<00:17, 347.20 examples/s]
Map:  49%|████▉     | 5686/11615 [00:16<00:16, 349.13 examples/s]
Map:  50%|████▉     | 5790/11615 [00:16<00:17, 341.32 examples/s]
Map:  50%|████▉     | 5761/11615 [00:16<00:17, 341.55 examples/s]
Map:  49%|████▉     | 5739/11615 [00:16<00:16, 360.63 examples/s]
Map:  50%|████▉     | 5794/11615 [00:16<00:16, 346.05 examples/s]
Map:  50%|████▉     | 5790/11615 [00:16<00:16, 348.14 examples/s]
Map:  50%|████▉     | 5751/11615 [00:16<00:17, 342.37 examples/s]
Map:  49%|████▉     | 5739/11615 [00:16<00:16, 348.21 examples/s]
Map:  50%|█████     | 5829/11615 [00:16<00:16, 349.81 examples/s]
Map:  50%|████▉     | 5799/11615 [00:16<00:16, 348.78 examples/s]
Map:  50%|█████     | 5830/11615 [00:16<00:16, 346.87 examples/s]
Map:  50%|█████     | 5829/11615 [00:16<00:16, 354.82 examples/s]
Map:  50%|████▉     | 5790/11615 [00:16<00:16, 351.99 examples/s]
Map:  50%|████▉     | 5789/11615 [00:16<00:16, 347.20 examples/s]
Map:  50%|█████     | 5837/11615 [00:16<00:16, 354.73 examples/s]
Map:  51%|█████     | 5870/11615 [00:16<00:15, 361.85 examples/s]
Map:  51%|█████     | 5873/11615 [00:16<00:15, 362.94 examples/s]
Map:  51%|█████     | 5870/11615 [00:16<00:15, 365.67 examples/s]
Map:  50%|████▉     | 5790/11615 [00:16<00:16, 343.14 examples/s]
Map:  50%|█████     | 5829/11615 [00:16<00:16, 356.45 examples/s]
Map:  50%|█████     | 5825/11615 [00:16<00:16, 347.42 examples/s]
Map:  51%|█████     | 5877/11615 [00:16<00:15, 363.93 examples/s]
Map:  51%|█████     | 5907/11615 [00:16<00:15, 359.93 examples/s]
Map:  50%|█████     | 5829/11615 [00:16<00:16, 349.41 examples/s]
Map:  51%|█████     | 5870/11615 [00:16<00:15, 365.65 examples/s]
Map:  51%|█████     | 5928/11615 [00:16<00:15, 358.13 examples/s]
Map:  51%|█████     | 5924/11615 [00:16<00:15, 360.08 examples/s]
Map:  50%|█████     | 5865/11615 [00:16<00:15, 360.89 examples/s]
Map:  51%|█████     | 5947/11615 [00:16<00:15, 367.72 examples/s]
Map:  51%|█████     | 5930/11615 [00:16<00:15, 356.74 examples/s]
Map:  51%|█████     | 5870/11615 [00:16<00:15, 359.79 examples/s]
Map:  51%|█████▏    | 5965/11615 [00:16<00:15, 358.12 examples/s]
Map:  51%|█████▏    | 5961/11615 [00:16<00:15, 360.78 examples/s]
Map:  51%|█████     | 5923/11615 [00:16<00:15, 360.06 examples/s]
Map:  51%|█████     | 5919/11615 [00:16<00:15, 358.63 examples/s]
Map:  51%|█████▏    | 5968/11615 [00:16<00:15, 361.49 examples/s]
Map:  52%|█████▏    | 5997/11615 [00:16<00:16, 348.96 examples/s]
Map:  51%|█████     | 5923/11615 [00:16<00:15, 355.81 examples/s]
Map:  51%|█████▏    | 5960/11615 [00:16<00:15, 358.76 examples/s]
Map:  51%|█████▏    | 5956/11615 [00:16<00:15, 360.62 examples/s]
Map:  52%|█████▏    | 6014/11615 [00:16<00:15, 353.42 examples/s]
Map:  52%|█████▏    | 6019/11615 [00:16<00:15, 351.70 examples/s]
Map:  52%|█████▏    | 6019/11615 [00:17<00:15, 349.81 examples/s]
Map:  51%|█████▏    | 5960/11615 [00:16<00:15, 355.23 examples/s]
Map:  52%|█████▏    | 6050/11615 [00:17<00:16, 344.50 examples/s]
Map:  52%|█████▏    | 6050/11615 [00:17<00:15, 351.48 examples/s]
Map:  52%|█████▏    | 6055/11615 [00:17<00:15, 351.32 examples/s]
Map:  52%|█████▏    | 6012/11615 [00:17<00:16, 350.13 examples/s]
Map:  52%|█████▏    | 6006/11615 [00:17<00:16, 348.01 examples/s]
Map:  52%|█████▏    | 5996/11615 [00:17<00:15, 352.64 examples/s]
Map:  52%|█████▏    | 6085/11615 [00:17<00:16, 341.19 examples/s]
Map:  52%|█████▏    | 6071/11615 [00:17<00:16, 345.73 examples/s]
Map:  52%|█████▏    | 6050/11615 [00:17<00:15, 348.91 examples/s]
Map:  53%|█████▎    | 6102/11615 [00:17<00:15, 345.56 examples/s]
Map:  53%|█████▎    | 6105/11615 [00:17<00:16, 339.45 examples/s]
Map:  53%|█████▎    | 6122/11615 [00:17<00:15, 347.20 examples/s]
Map:  52%|█████▏    | 6060/11615 [00:17<00:15, 348.62 examples/s]
Map:  53%|█████▎    | 6106/11615 [00:17<00:16, 343.85 examples/s]
Map:  52%|█████▏    | 6050/11615 [00:17<00:16, 345.47 examples/s]
Map:  53%|█████▎    | 6141/11615 [00:17<00:15, 353.97 examples/s]
Map:  53%|█████▎    | 6143/11615 [00:17<00:15, 343.83 examples/s]
Map:  53%|█████▎    | 6102/11615 [00:17<00:16, 342.77 examples/s]
Map:  53%|█████▎    | 6144/11615 [00:17<00:15, 349.92 examples/s]
Map:  53%|█████▎    | 6176/11615 [00:17<00:15, 349.30 examples/s]
Map:  52%|█████▏    | 6085/11615 [00:17<00:16, 341.71 examples/s]
Map:  53%|█████▎    | 6111/11615 [00:17<00:16, 343.23 examples/s]
Map:  53%|█████▎    | 6179/11615 [00:17<00:15, 357.00 examples/s]
Map:  53%|█████▎    | 6180/11615 [00:17<00:15, 346.15 examples/s]
Map:  53%|█████▎    | 6140/11615 [00:17<00:15, 350.31 examples/s]
Map:  53%|█████▎    | 6181/11615 [00:17<00:15, 352.90 examples/s]
Map:  53%|█████▎    | 6122/11615 [00:17<00:15, 347.16 examples/s]
Map:  53%|█████▎    | 6149/11615 [00:17<00:15, 349.91 examples/s]
Map:  54%|█████▎    | 6215/11615 [00:17<00:15, 353.53 examples/s]
Map:  54%|█████▎    | 6230/11615 [00:17<00:15, 350.75 examples/s]
Map:  54%|█████▎    | 6220/11615 [00:17<00:15, 349.09 examples/s]
Map:  53%|█████▎    | 6176/11615 [00:17<00:15, 350.96 examples/s]
Map:  54%|█████▎    | 6220/11615 [00:17<00:15, 352.02 examples/s]
Map:  53%|█████▎    | 6157/11615 [00:17<00:15, 345.45 examples/s]
Map:  53%|█████▎    | 6186/11615 [00:17<00:15, 353.62 examples/s]
Map:  54%|█████▍    | 6252/11615 [00:17<00:15, 356.90 examples/s]
Map:  54%|█████▍    | 6268/11615 [00:17<00:14, 357.03 examples/s]
Map:  54%|█████▍    | 6260/11615 [00:17<00:15, 353.89 examples/s]
Map:  53%|█████▎    | 6195/11615 [00:17<00:15, 353.48 examples/s]
Map:  54%|█████▍    | 6260/11615 [00:17<00:15, 355.84 examples/s]
Map:  54%|█████▎    | 6230/11615 [00:17<00:15, 351.73 examples/s]
Map:  54%|█████▍    | 6290/11615 [00:17<00:14, 359.48 examples/s]
Map:  54%|█████▍    | 6298/11615 [00:17<00:14, 358.27 examples/s]
Map:  54%|█████▎    | 6240/11615 [00:17<00:15, 348.29 examples/s]
Map:  54%|█████▍    | 6323/11615 [00:17<00:14, 355.31 examples/s]
Map:  54%|█████▍    | 6268/11615 [00:17<00:14, 357.28 examples/s]
Map:  54%|█████▍    | 6298/11615 [00:17<00:14, 359.48 examples/s]
Map:  54%|█████▍    | 6329/11615 [00:17<00:14, 365.66 examples/s]
Map:  54%|█████▍    | 6249/11615 [00:17<00:15, 353.09 examples/s]
Map:  55%|█████▍    | 6337/11615 [00:17<00:14, 363.24 examples/s]
Map:  54%|█████▍    | 6280/11615 [00:17<00:14, 357.09 examples/s]
Map:  55%|█████▍    | 6364/11615 [00:17<00:14, 363.57 examples/s]
Map:  55%|█████▍    | 6337/11615 [00:17<00:14, 363.80 examples/s]
Map:  55%|█████▍    | 6367/11615 [00:17<00:14, 366.38 examples/s]
Map:  54%|█████▍    | 6288/11615 [00:17<00:14, 358.52 examples/s]
Map:  54%|█████▍    | 6323/11615 [00:17<00:14, 354.02 examples/s]
Map:  55%|█████▌    | 6392/11615 [00:18<00:14, 359.28 examples/s]
Map:  55%|█████▍    | 6337/11615 [00:18<00:14, 359.83 examples/s]
Map:  55%|█████▌    | 6420/11615 [00:18<00:14, 360.80 examples/s]
Map:  55%|█████▍    | 6363/11615 [00:18<00:14, 364.58 examples/s]
Map:  55%|█████▌    | 6392/11615 [00:18<00:14, 358.92 examples/s]
Map:  55%|█████▌    | 6422/11615 [00:18<00:14, 362.99 examples/s]
Map:  55%|█████▍    | 6342/11615 [00:18<00:14, 355.85 examples/s]
Map:  55%|█████▌    | 6430/11615 [00:18<00:14, 362.02 examples/s]
Map:  56%|█████▌    | 6459/11615 [00:18<00:14, 361.68 examples/s]
Map:  55%|█████▌    | 6430/11615 [00:18<00:14, 361.45 examples/s]
Map:  55%|█████▌    | 6390/11615 [00:18<00:14, 354.63 examples/s]
Map:  56%|█████▌    | 6460/11615 [00:18<00:14, 359.78 examples/s]
Map:  55%|█████▍    | 6379/11615 [00:18<00:14, 356.01 examples/s]
Map:  55%|█████▌    | 6417/11615 [00:18<00:14, 360.10 examples/s]
Map:  56%|█████▌    | 6467/11615 [00:18<00:14, 361.60 examples/s]
Map:  56%|█████▌    | 6467/11615 [00:18<00:14, 360.59 examples/s]
Map:  55%|█████▌    | 6430/11615 [00:18<00:14, 358.82 examples/s]
Map:  56%|█████▌    | 6499/11615 [00:18<00:14, 362.53 examples/s]
Map:  55%|█████▌    | 6417/11615 [00:18<00:14, 360.71 examples/s]
Map:  56%|█████▌    | 6511/11615 [00:18<00:14, 354.80 examples/s]
Map:  56%|█████▌    | 6470/11615 [00:18<00:14, 353.25 examples/s]
Map:  56%|█████▌    | 6520/11615 [00:18<00:14, 355.08 examples/s]
Map:  56%|█████▌    | 6520/11615 [00:18<00:14, 354.25 examples/s]
Map:  56%|█████▋    | 6550/11615 [00:18<00:14, 356.74 examples/s]
Map:  56%|█████▌    | 6485/11615 [00:18<00:14, 359.73 examples/s]
Map:  56%|█████▋    | 6553/11615 [00:18<00:14, 359.27 examples/s]
Map:  56%|█████▌    | 6469/11615 [00:18<00:14, 354.66 examples/s]
Map:  56%|█████▌    | 6508/11615 [00:18<00:14, 356.85 examples/s]
Map:  56%|█████▋    | 6559/11615 [00:18<00:13, 362.52 examples/s]
Map:  56%|█████▋    | 6559/11615 [00:18<00:13, 361.94 examples/s]
Map:  57%|█████▋    | 6590/11615 [00:18<00:13, 359.01 examples/s]
Map:  57%|█████▋    | 6590/11615 [00:18<00:13, 359.87 examples/s]
Map:  56%|█████▌    | 6506/11615 [00:18<00:14, 354.61 examples/s]
Map:  57%|█████▋    | 6596/11615 [00:18<00:13, 363.64 examples/s]
Map:  56%|█████▋    | 6538/11615 [00:18<00:14, 351.77 examples/s]
Map:  57%|█████▋    | 6596/11615 [00:18<00:13, 363.14 examples/s]
Map:  56%|█████▋    | 6562/11615 [00:18<00:14, 354.20 examples/s]
Map:  57%|█████▋    | 6630/11615 [00:18<00:13, 367.01 examples/s]
Map:  57%|█████▋    | 6630/11615 [00:18<00:13, 368.24 examples/s]
Map:  57%|█████▋    | 6635/11615 [00:18<00:13, 361.54 examples/s]
Map:  57%|█████▋    | 6578/11615 [00:18<00:13, 361.60 examples/s]
Map:  56%|█████▋    | 6561/11615 [00:18<00:14, 356.20 examples/s]
Map:  57%|█████▋    | 6600/11615 [00:18<00:14, 358.15 examples/s]
Map:  57%|█████▋    | 6635/11615 [00:18<00:13, 360.95 examples/s]
Map:  57%|█████▋    | 6667/11615 [00:18<00:13, 366.46 examples/s]
Map:  57%|█████▋    | 6667/11615 [00:18<00:13, 368.04 examples/s]
Map:  57%|█████▋    | 6674/11615 [00:18<00:13, 369.04 examples/s]
Map:  57%|█████▋    | 6616/11615 [00:18<00:13, 363.95 examples/s]
Map:  57%|█████▋    | 6600/11615 [00:18<00:13, 358.84 examples/s]
Map:  57%|█████▋    | 6674/11615 [00:18<00:13, 368.57 examples/s]
Map:  57%|█████▋    | 6654/11615 [00:18<00:13, 357.35 examples/s]
Map:  58%|█████▊    | 6722/11615 [00:18<00:13, 362.65 examples/s]
Map:  58%|█████▊    | 6722/11615 [00:18<00:13, 363.80 examples/s]
Map:  57%|█████▋    | 6653/11615 [00:18<00:13, 357.12 examples/s]
Map:  57%|█████▋    | 6637/11615 [00:18<00:13, 359.20 examples/s]
Map:  58%|█████▊    | 6730/11615 [00:18<00:13, 365.38 examples/s]
Map:  58%|█████▊    | 6691/11615 [00:18<00:13, 355.61 examples/s]
Map:  58%|█████▊    | 6730/11615 [00:18<00:13, 362.88 examples/s]
Map:  58%|█████▊    | 6759/11615 [00:18<00:13, 361.68 examples/s]
Map:  58%|█████▊    | 6759/11615 [00:19<00:13, 363.44 examples/s]
Map:  58%|█████▊    | 6690/11615 [00:19<00:13, 358.40 examples/s]
Map:  57%|█████▋    | 6675/11615 [00:19<00:13, 363.59 examples/s]
Map:  58%|█████▊    | 6731/11615 [00:19<00:13, 363.49 examples/s]
Map:  59%|█████▊    | 6797/11615 [00:19<00:13, 364.23 examples/s]
Map:  58%|█████▊    | 6783/11615 [00:19<00:13, 358.66 examples/s]
Map:  59%|█████▊    | 6797/11615 [00:19<00:13, 365.78 examples/s]
Map:  58%|█████▊    | 6730/11615 [00:19<00:13, 364.79 examples/s]
Map:  58%|█████▊    | 6785/11615 [00:19<00:13, 360.97 examples/s]
Map:  58%|█████▊    | 6731/11615 [00:19<00:13, 361.54 examples/s]
Map:  59%|█████▊    | 6822/11615 [00:19<00:13, 363.69 examples/s]
Map:  59%|█████▉    | 6838/11615 [00:19<00:12, 371.59 examples/s]
Map:  59%|█████▉    | 6838/11615 [00:19<00:12, 373.10 examples/s]
Map:  58%|█████▊    | 6785/11615 [00:19<00:13, 360.59 examples/s]
Map:  59%|█████▊    | 6823/11615 [00:19<00:13, 365.41 examples/s]
Map:  58%|█████▊    | 6785/11615 [00:19<00:13, 362.02 examples/s]
Map:  58%|█████▊    | 6785/11615 [00:19<00:13, 359.73 examples/s]
Map:  59%|█████▉    | 6824/11615 [00:19<00:13, 365.62 examples/s]
Map:  59%|█████▉    | 6860/11615 [00:19<00:13, 361.45 examples/s]
Map:  59%|█████▉    | 6893/11615 [00:19<00:12, 368.65 examples/s]
Map:  59%|█████▉    | 6873/11615 [00:19<00:13, 352.39 examples/s]
Map:  59%|█████▉    | 6893/11615 [00:19<00:12, 369.42 examples/s]
Map:  59%|█████▉    | 6824/11615 [00:19<00:13, 366.57 examples/s]
Map:  59%|█████▉    | 6824/11615 [00:19<00:13, 364.74 examples/s]
Map:  60%|█████▉    | 6911/11615 [00:19<00:13, 355.97 examples/s]
Map:  59%|█████▉    | 6900/11615 [00:19<00:12, 363.23 examples/s]
Map:  59%|█████▉    | 6880/11615 [00:19<00:13, 362.74 examples/s]
Map:  60%|█████▉    | 6944/11615 [00:19<00:13, 355.78 examples/s]
Map:  60%|█████▉    | 6944/11615 [00:19<00:13, 356.49 examples/s]
Map:  59%|█████▉    | 6880/11615 [00:19<00:13, 362.17 examples/s]
Map:  59%|█████▉    | 6880/11615 [00:19<00:13, 361.41 examples/s]
Map:  60%|█████▉    | 6918/11615 [00:19<00:12, 362.03 examples/s]
Map:  60%|█████▉    | 6960/11615 [00:19<00:13, 342.25 examples/s]
Map:  60%|█████▉    | 6950/11615 [00:19<00:13, 348.77 examples/s]
Map:  60%|█████▉    | 6918/11615 [00:19<00:12, 361.58 examples/s]
Map:  60%|██████    | 6999/11615 [00:19<00:12, 355.51 examples/s]
Map:  60%|██████    | 6999/11615 [00:19<00:12, 356.03 examples/s]
Map:  60%|█████▉    | 6918/11615 [00:19<00:13, 360.95 examples/s]
Map:  60%|██████    | 6999/11615 [00:19<00:13, 350.12 examples/s]
Map:  60%|█████▉    | 6967/11615 [00:19<00:13, 345.42 examples/s]
Map:  60%|██████    | 7007/11615 [00:19<00:12, 355.68 examples/s]
Map:  60%|█████▉    | 6967/11615 [00:19<00:13, 344.68 examples/s]
Map:  61%|██████    | 7051/11615 [00:19<00:12, 351.45 examples/s]
Map:  61%|██████    | 7051/11615 [00:19<00:12, 352.30 examples/s]
Map:  60%|█████▉    | 6967/11615 [00:19<00:13, 344.48 examples/s]
Map:  60%|██████    | 7006/11615 [00:19<00:12, 354.82 examples/s]
Map:  61%|██████    | 7051/11615 [00:19<00:13, 347.26 examples/s]
Map:  60%|██████    | 7006/11615 [00:19<00:13, 353.73 examples/s]
Map:  61%|██████    | 7088/11615 [00:19<00:12, 352.45 examples/s]
Map:  61%|██████    | 7062/11615 [00:19<00:12, 357.17 examples/s]
Map:  61%|██████    | 7088/11615 [00:19<00:12, 353.60 examples/s]
Map:  60%|██████    | 7007/11615 [00:19<00:13, 353.61 examples/s]
Map:  61%|██████    | 7088/11615 [00:19<00:12, 348.97 examples/s]
Map:  61%|██████    | 7060/11615 [00:20<00:12, 353.92 examples/s]
Map:  61%|██████▏   | 7124/11615 [00:20<00:12, 350.70 examples/s]
Map:  61%|██████    | 7060/11615 [00:20<00:12, 352.71 examples/s]
Map:  61%|██████▏   | 7142/11615 [00:20<00:12, 351.44 examples/s]
Map:  61%|██████    | 7112/11615 [00:20<00:13, 346.35 examples/s]
Map:  61%|██████    | 7062/11615 [00:20<00:12, 355.25 examples/s]
Map:  61%|██████▏   | 7142/11615 [00:20<00:12, 349.03 examples/s]
Map:  62%|██████▏   | 7161/11615 [00:20<00:12, 351.69 examples/s]
Map:  61%|██████    | 7110/11615 [00:20<00:13, 345.46 examples/s]
Map:  62%|██████▏   | 7180/11615 [00:20<00:12, 354.63 examples/s]
Map:  62%|██████▏   | 7149/11615 [00:20<00:12, 349.41 examples/s]
Map:  61%|██████    | 7110/11615 [00:20<00:13, 344.53 examples/s]
Map:  62%|██████▏   | 7180/11615 [00:20<00:12, 352.69 examples/s]
Map:  62%|██████▏   | 7201/11615 [00:20<00:12, 356.27 examples/s]
Map:  62%|██████▏   | 7147/11615 [00:20<00:12, 347.43 examples/s]
Map:  61%|██████    | 7112/11615 [00:20<00:13, 344.10 examples/s]
Map:  62%|██████▏   | 7219/11615 [00:20<00:12, 358.39 examples/s]
Map:  62%|██████▏   | 7188/11615 [00:20<00:12, 356.98 examples/s]
Map:  62%|██████▏   | 7147/11615 [00:20<00:12, 346.99 examples/s]
Map:  62%|██████▏   | 7219/11615 [00:20<00:12, 356.98 examples/s]
Map:  62%|██████▏   | 7238/11615 [00:20<00:12, 357.25 examples/s]
Map:  62%|██████▏   | 7149/11615 [00:20<00:12, 347.69 examples/s]
Map:  62%|██████▏   | 7186/11615 [00:20<00:12, 354.29 examples/s]
Map:  62%|██████▏   | 7186/11615 [00:20<00:12, 354.05 examples/s]
Map:  63%|██████▎   | 7270/11615 [00:20<00:12, 347.88 examples/s]
Map:  62%|██████▏   | 7240/11615 [00:20<00:12, 348.12 examples/s]
Map:  63%|██████▎   | 7274/11615 [00:20<00:12, 355.69 examples/s]
Map:  62%|██████▏   | 7187/11615 [00:20<00:12, 354.74 examples/s]
Map:  62%|██████▏   | 7223/11615 [00:20<00:12, 351.63 examples/s]
Map:  63%|██████▎   | 7270/11615 [00:20<00:12, 346.67 examples/s]
Map:  62%|██████▏   | 7223/11615 [00:20<00:12, 351.88 examples/s]
Map:  63%|██████▎   | 7309/11615 [00:20<00:12, 356.09 examples/s]
Map:  63%|██████▎   | 7279/11615 [00:20<00:12, 353.02 examples/s]
Map:  63%|██████▎   | 7311/11615 [00:20<00:12, 355.98 examples/s]
Map:  62%|██████▏   | 7223/11615 [00:20<00:12, 351.06 examples/s]
Map:  63%|██████▎   | 7260/11615 [00:20<00:13, 322.81 examples/s]
Map:  63%|██████▎   | 7312/11615 [00:20<00:14, 302.42 examples/s]
Map:  63%|██████▎   | 7349/11615 [00:20<00:13, 323.61 examples/s]
Map:  63%|██████▎   | 7351/11615 [00:20<00:13, 317.19 examples/s]
Map:  63%|██████▎   | 7323/11615 [00:20<00:13, 309.43 examples/s]
Map:  63%|██████▎   | 7270/11615 [00:20<00:14, 305.03 examples/s]
Map:  63%|██████▎   | 7310/11615 [00:20<00:13, 325.13 examples/s]
Map:  63%|██████▎   | 7349/11615 [00:20<00:13, 314.24 examples/s]
Map:  63%|██████▎   | 7275/11615 [00:20<00:14, 306.87 examples/s]
Map:  64%|██████▎   | 7390/11615 [00:20<00:12, 337.20 examples/s]
Map:  64%|██████▎   | 7390/11615 [00:20<00:12, 335.49 examples/s]
Map:  63%|██████▎   | 7309/11615 [00:20<00:13, 322.97 examples/s]
Map:  63%|██████▎   | 7360/11615 [00:20<00:13, 319.87 examples/s]
Map:  63%|██████▎   | 7348/11615 [00:20<00:12, 332.39 examples/s]
Map:  64%|██████▎   | 7389/11615 [00:20<00:12, 332.05 examples/s]
Map:  63%|██████▎   | 7312/11615 [00:20<00:13, 317.31 examples/s]
Map:  64%|██████▍   | 7430/11615 [00:20<00:12, 345.19 examples/s]
Map:  63%|██████▎   | 7343/11615 [00:20<00:13, 326.50 examples/s]
Map:  64%|██████▍   | 7430/11615 [00:20<00:12, 344.67 examples/s]
Map:  64%|██████▎   | 7400/11615 [00:20<00:12, 336.66 examples/s]
Map:  64%|██████▎   | 7387/11615 [00:20<00:12, 344.53 examples/s]
Map:  64%|██████▍   | 7427/11615 [00:20<00:12, 342.28 examples/s]
Map:  63%|██████▎   | 7348/11615 [00:20<00:13, 326.49 examples/s]
Map:  64%|██████▍   | 7468/11615 [00:21<00:11, 352.78 examples/s]
Map:  64%|██████▎   | 7382/11615 [00:21<00:12, 340.53 examples/s]
Map:  64%|██████▍   | 7468/11615 [00:21<00:11, 352.56 examples/s]
Map:  64%|██████▍   | 7439/11615 [00:21<00:12, 345.85 examples/s]
Map:  64%|██████▍   | 7423/11615 [00:21<00:12, 347.43 examples/s]
Map:  64%|██████▎   | 7387/11615 [00:21<00:12, 339.70 examples/s]
Map:  65%|██████▍   | 7506/11615 [00:21<00:11, 359.05 examples/s]
Map:  64%|██████▍   | 7420/11615 [00:21<00:12, 347.72 examples/s]
Map:  65%|██████▍   | 7507/11615 [00:21<00:11, 359.66 examples/s]
Map:  64%|██████▍   | 7484/11615 [00:21<00:11, 351.55 examples/s]
Map:  64%|██████▍   | 7476/11615 [00:21<00:11, 350.30 examples/s]
Map:  64%|██████▍   | 7460/11615 [00:21<00:11, 349.58 examples/s]
Map:  64%|██████▍   | 7423/11615 [00:21<00:12, 343.90 examples/s]
Map:  64%|██████▍   | 7457/11615 [00:21<00:11, 352.68 examples/s]
Map:  65%|██████▌   | 7561/11615 [00:21<00:11, 359.03 examples/s]
Map:  65%|██████▍   | 7527/11615 [00:21<00:11, 345.24 examples/s]
Map:  65%|██████▌   | 7564/11615 [00:21<00:11, 362.65 examples/s]
Map:  65%|██████▍   | 7500/11615 [00:21<00:11, 358.29 examples/s]
Map:  65%|██████▍   | 7535/11615 [00:21<00:11, 342.87 examples/s]
Map:  64%|██████▍   | 7460/11615 [00:21<00:11, 347.08 examples/s]
Map:  65%|██████▍   | 7494/11615 [00:21<00:11, 357.28 examples/s]
Map:  65%|██████▌   | 7600/11615 [00:21<00:11, 357.94 examples/s]
Map:  65%|██████▌   | 7578/11615 [00:21<00:11, 361.85 examples/s]
Map:  65%|██████▌   | 7570/11615 [00:21<00:11, 362.08 examples/s]
Map:  65%|██████▍   | 7500/11615 [00:21<00:11, 357.84 examples/s]
Map:  66%|██████▌   | 7619/11615 [00:21<00:11, 358.79 examples/s]
Map:  65%|██████▌   | 7554/11615 [00:21<00:11, 352.08 examples/s]
Map:  66%|██████▌   | 7637/11615 [00:21<00:11, 358.75 examples/s]
Map:  65%|██████▍   | 7546/11615 [00:21<00:11, 348.37 examples/s]
Map:  66%|██████▌   | 7657/11615 [00:21<00:10, 362.37 examples/s]
Map:  66%|██████▌   | 7623/11615 [00:21<00:11, 355.40 examples/s]
Map:  66%|██████▌   | 7631/11615 [00:21<00:11, 354.97 examples/s]
Map:  65%|██████▌   | 7592/11615 [00:21<00:11, 357.01 examples/s]
Map:  65%|██████▌   | 7554/11615 [00:21<00:11, 352.44 examples/s]
Map:  66%|██████▌   | 7674/11615 [00:21<00:10, 360.51 examples/s]
Map:  65%|██████▌   | 7585/11615 [00:21<00:11, 357.78 examples/s]
Map:  66%|██████▌   | 7660/11615 [00:21<00:11, 358.48 examples/s]
Map:  66%|██████▌   | 7670/11615 [00:21<00:10, 359.19 examples/s]
Map:  66%|██████▌   | 7630/11615 [00:21<00:11, 354.98 examples/s]
Map:  65%|██████▌   | 7592/11615 [00:21<00:11, 357.43 examples/s]
Map:  66%|██████▋   | 7710/11615 [00:21<00:11, 354.37 examples/s]
Map:  67%|██████▋   | 7730/11615 [00:21<00:10, 356.27 examples/s]
Map:  66%|██████▌   | 7640/11615 [00:21<00:11, 354.99 examples/s]
Map:  66%|██████▋   | 7699/11615 [00:21<00:10, 362.53 examples/s]
Map:  66%|██████▌   | 7668/11615 [00:21<00:11, 358.09 examples/s]
Map:  66%|██████▌   | 7630/11615 [00:21<00:11, 355.74 examples/s]
Map:  66%|██████▋   | 7723/11615 [00:21<00:10, 354.43 examples/s]
Map:  67%|██████▋   | 7748/11615 [00:21<00:10, 356.97 examples/s]
Map:  66%|██████▌   | 7680/11615 [00:21<00:10, 358.22 examples/s]
Map:  66%|██████▌   | 7668/11615 [00:21<00:10, 359.17 examples/s]
Map:  67%|██████▋   | 7781/11615 [00:21<00:10, 349.07 examples/s]
Map:  67%|██████▋   | 7750/11615 [00:21<00:11, 350.84 examples/s]
Map:  66%|██████▋   | 7720/11615 [00:21<00:11, 348.49 examples/s]
Map:  67%|██████▋   | 7774/11615 [00:21<00:11, 348.57 examples/s]
Map:  67%|██████▋   | 7800/11615 [00:21<00:10, 347.53 examples/s]
Map:  67%|██████▋   | 7819/11615 [00:21<00:10, 349.75 examples/s]
Map:  67%|██████▋   | 7731/11615 [00:22<00:11, 349.88 examples/s]
Map:  67%|██████▋   | 7786/11615 [00:22<00:10, 348.56 examples/s]
Map:  67%|██████▋   | 7756/11615 [00:22<00:11, 349.44 examples/s]
Map:  66%|██████▋   | 7720/11615 [00:22<00:11, 349.47 examples/s]
Map:  67%|██████▋   | 7810/11615 [00:22<00:10, 346.17 examples/s]
Map:  67%|██████▋   | 7838/11615 [00:22<00:10, 354.36 examples/s]
Map:  68%|██████▊   | 7855/11615 [00:22<00:10, 351.62 examples/s]
Map:  67%|██████▋   | 7769/11615 [00:22<00:11, 349.46 examples/s]
Map:  67%|██████▋   | 7756/11615 [00:22<00:11, 350.17 examples/s]
Map:  68%|██████▊   | 7849/11615 [00:22<00:10, 355.34 examples/s]
Map:  67%|██████▋   | 7840/11615 [00:22<00:10, 345.95 examples/s]
Map:  67%|██████▋   | 7807/11615 [00:22<00:11, 345.03 examples/s]
Map:  68%|██████▊   | 7890/11615 [00:22<00:10, 344.58 examples/s]
Map:  68%|██████▊   | 7909/11615 [00:22<00:10, 350.28 examples/s]
Map:  67%|██████▋   | 7820/11615 [00:22<00:11, 342.93 examples/s]
Map:  68%|██████▊   | 7842/11615 [00:22<00:10, 345.30 examples/s]
Map:  68%|██████▊   | 7878/11615 [00:22<00:10, 347.18 examples/s]
Map:  67%|██████▋   | 7808/11615 [00:22<00:10, 346.43 examples/s]
Map:  68%|██████▊   | 7900/11615 [00:22<00:10, 346.51 examples/s]
Map:  68%|██████▊   | 7927/11615 [00:22<00:10, 349.18 examples/s]
Map:  68%|██████▊   | 7878/11615 [00:22<00:10, 343.82 examples/s]
Map:  68%|██████▊   | 7914/11615 [00:22<00:10, 349.77 examples/s]
Map:  68%|██████▊   | 7860/11615 [00:22<00:10, 344.92 examples/s]
Map:  68%|██████▊   | 7844/11615 [00:22<00:10, 347.42 examples/s]
Map:  69%|██████▊   | 7960/11615 [00:22<00:10, 345.82 examples/s]
Map:  68%|██████▊   | 7935/11615 [00:22<00:10, 343.05 examples/s]
Map:  69%|██████▊   | 7964/11615 [00:22<00:10, 351.51 examples/s]
Map:  68%|██████▊   | 7914/11615 [00:22<00:10, 346.60 examples/s]
Map:  68%|██████▊   | 7879/11615 [00:22<00:10, 344.76 examples/s]
Map:  69%|██████▊   | 7970/11615 [00:22<00:10, 342.45 examples/s]
Map:  69%|██████▉   | 8000/11615 [00:22<00:10, 345.67 examples/s]
Map:  68%|██████▊   | 7914/11615 [00:22<00:10, 346.97 examples/s]
Map:  69%|██████▊   | 7968/11615 [00:22<00:10, 344.37 examples/s]
Map:  69%|██████▉   | 8013/11615 [00:22<00:10, 346.35 examples/s]
Map:  68%|██████▊   | 7915/11615 [00:22<00:10, 346.72 examples/s]
Map:  69%|██████▉   | 8006/11615 [00:22<00:10, 345.49 examples/s]
Map:  69%|██████▊   | 7967/11615 [00:22<00:10, 344.18 examples/s]
Map:  69%|██████▉   | 8003/11615 [00:22<00:10, 344.14 examples/s]
Map:  69%|██████▉   | 8038/11615 [00:22<00:10, 350.62 examples/s]
Map:  69%|██████▊   | 7967/11615 [00:22<00:10, 344.76 examples/s]
Map:  69%|██████▉   | 8069/11615 [00:22<00:10, 353.59 examples/s]
Map:  69%|██████▉   | 8041/11615 [00:22<00:10, 344.29 examples/s]
Map:  70%|██████▉   | 8079/11615 [00:22<00:09, 363.72 examples/s]
Map:  69%|██████▉   | 8040/11615 [00:22<00:10, 343.62 examples/s]
Map:  69%|██████▊   | 7968/11615 [00:22<00:10, 341.20 examples/s]
Map:  69%|██████▉   | 8020/11615 [00:22<00:10, 341.74 examples/s]
Map:  70%|██████▉   | 8082/11615 [00:22<00:09, 360.22 examples/s]
Map:  69%|██████▉   | 8020/11615 [00:22<00:10, 341.88 examples/s]
Map:  70%|██████▉   | 8081/11615 [00:22<00:09, 357.96 examples/s]
Map:  70%|██████▉   | 8122/11615 [00:22<00:10, 348.84 examples/s]
Map:  69%|██████▉   | 8056/11615 [00:22<00:10, 344.15 examples/s]
Map:  70%|██████▉   | 8130/11615 [00:22<00:09, 351.94 examples/s]
Map:  69%|██████▉   | 8021/11615 [00:22<00:10, 343.22 examples/s]
Map:  70%|██████▉   | 8119/11615 [00:22<00:09, 355.18 examples/s]
Map:  69%|██████▉   | 8056/11615 [00:22<00:10, 343.67 examples/s]
Map:  70%|███████   | 8157/11615 [00:22<00:09, 347.71 examples/s]
Map:  70%|██████▉   | 8092/11615 [00:23<00:10, 345.70 examples/s]
Map:  70%|███████   | 8166/11615 [00:23<00:09, 350.25 examples/s]
Map:  70%|███████   | 8135/11615 [00:23<00:09, 354.82 examples/s]
Map:  69%|██████▉   | 8057/11615 [00:22<00:10, 346.18 examples/s]
Map:  70%|██████▉   | 8092/11615 [00:23<00:10, 344.71 examples/s]
Map:  71%|███████   | 8196/11615 [00:23<00:09, 357.48 examples/s]
Map:  70%|███████   | 8172/11615 [00:23<00:09, 347.62 examples/s]
Map:  70%|██████▉   | 8130/11615 [00:23<00:10, 347.88 examples/s]
Map:  71%|███████   | 8203/11615 [00:23<00:09, 353.03 examples/s]
Map:  70%|██████▉   | 8093/11615 [00:23<00:10, 347.86 examples/s]
Map:  70%|██████▉   | 8130/11615 [00:23<00:10, 347.01 examples/s]
Map:  71%|███████   | 8190/11615 [00:23<00:09, 351.63 examples/s]
Map:  71%|███████   | 8209/11615 [00:23<00:09, 350.96 examples/s]
Map:  70%|███████   | 8165/11615 [00:23<00:10, 344.94 examples/s]
Map:  71%|███████   | 8243/11615 [00:23<00:09, 362.06 examples/s]
Map:  70%|██████▉   | 8130/11615 [00:23<00:09, 348.63 examples/s]
Map:  71%|███████   | 8250/11615 [00:23<00:09, 353.21 examples/s]
Map:  70%|███████   | 8165/11615 [00:23<00:10, 344.29 examples/s]
Map:  71%|███████   | 8228/11615 [00:23<00:09, 355.18 examples/s]
Map:  71%|███████   | 8248/11615 [00:23<00:09, 358.20 examples/s]
Map:  71%|███████   | 8201/11615 [00:23<00:09, 347.04 examples/s]
Map:  71%|███████▏  | 8280/11615 [00:23<00:09, 355.26 examples/s]
Map:  70%|███████   | 8165/11615 [00:23<00:10, 344.60 examples/s]
Map:  71%|███████▏  | 8290/11615 [00:23<00:09, 357.67 examples/s]
Map:  71%|███████   | 8201/11615 [00:23<00:09, 346.74 examples/s]
Map:  71%|███████   | 8264/11615 [00:23<00:09, 353.36 examples/s]
Map:  71%|███████   | 8240/11615 [00:23<00:09, 354.69 examples/s]
Map:  71%|███████   | 8201/11615 [00:23<00:09, 346.40 examples/s]
Map:  72%|███████▏  | 8320/11615 [00:23<00:09, 359.53 examples/s]
Map:  72%|███████▏  | 8328/11615 [00:23<00:09, 359.73 examples/s]
Map:  71%|███████▏  | 8300/11615 [00:23<00:09, 352.27 examples/s]
Map:  71%|███████▏  | 8300/11615 [00:23<00:09, 353.86 examples/s]
Map:  71%|███████   | 8240/11615 [00:23<00:09, 354.64 examples/s]
Map:  71%|███████▏  | 8276/11615 [00:23<00:09, 353.90 examples/s]
Map:  71%|███████   | 8240/11615 [00:23<00:09, 353.89 examples/s]
Map:  72%|███████▏  | 8357/11615 [00:23<00:09, 361.12 examples/s]
Map:  72%|███████▏  | 8366/11615 [00:23<00:08, 362.66 examples/s]
Map:  72%|███████▏  | 8338/11615 [00:23<00:09, 355.23 examples/s]
Map:  71%|███████▏  | 8276/11615 [00:23<00:09, 354.03 examples/s]
Map:  72%|███████▏  | 8338/11615 [00:23<00:09, 357.45 examples/s]
Map:  72%|███████▏  | 8312/11615 [00:23<00:09, 354.41 examples/s]
Map:  71%|███████▏  | 8276/11615 [00:23<00:09, 352.98 examples/s]
Map:  72%|███████▏  | 8396/11615 [00:23<00:08, 364.94 examples/s]
Map:  72%|███████▏  | 8375/11615 [00:23<00:09, 357.30 examples/s]
Map:  72%|███████▏  | 8312/11615 [00:23<00:09, 354.48 examples/s]
Map:  72%|███████▏  | 8376/11615 [00:23<00:08, 360.48 examples/s]
Map:  72%|███████▏  | 8420/11615 [00:23<00:08, 357.42 examples/s]
Map:  72%|███████▏  | 8349/11615 [00:23<00:09, 355.76 examples/s]
Map:  72%|███████▏  | 8312/11615 [00:23<00:09, 353.80 examples/s]
Map:  72%|███████▏  | 8413/11615 [00:23<00:08, 362.31 examples/s]
Map:  72%|███████▏  | 8349/11615 [00:23<00:09, 355.80 examples/s]
Map:  72%|███████▏  | 8415/11615 [00:23<00:08, 363.70 examples/s]
Map:  73%|███████▎  | 8457/11615 [00:23<00:08, 356.39 examples/s]
Map:  73%|███████▎  | 8451/11615 [00:23<00:08, 356.99 examples/s]
Map:  72%|███████▏  | 8386/11615 [00:23<00:09, 357.57 examples/s]
Map:  72%|███████▏  | 8349/11615 [00:23<00:09, 355.30 examples/s]
Map:  72%|███████▏  | 8386/11615 [00:23<00:09, 357.14 examples/s]
Map:  73%|███████▎  | 8487/11615 [00:23<00:08, 353.04 examples/s]
Map:  73%|███████▎  | 8464/11615 [00:23<00:09, 349.74 examples/s]
Map:  72%|███████▏  | 8386/11615 [00:23<00:09, 357.06 examples/s]
Map:  73%|███████▎  | 8468/11615 [00:23<00:08, 355.42 examples/s]
Map:  73%|███████▎  | 8509/11615 [00:23<00:08, 347.65 examples/s]
Map:  73%|███████▎  | 8438/11615 [00:23<00:09, 350.53 examples/s]
Map:  73%|███████▎  | 8438/11615 [00:24<00:09, 350.92 examples/s]
Map:  73%|███████▎  | 8422/11615 [00:24<00:09, 352.38 examples/s]
Map:  74%|███████▎  | 8539/11615 [00:24<00:08, 346.63 examples/s]
Map:  73%|███████▎  | 8515/11615 [00:24<00:09, 342.50 examples/s]
Map:  73%|███████▎  | 8475/11615 [00:24<00:08, 353.55 examples/s]
Map:  73%|███████▎  | 8518/11615 [00:24<00:08, 345.84 examples/s]
Map:  74%|███████▎  | 8557/11615 [00:24<00:09, 334.90 examples/s]
Map:  73%|███████▎  | 8475/11615 [00:24<00:08, 354.76 examples/s]
Map:  73%|███████▎  | 8460/11615 [00:24<00:09, 349.25 examples/s]
Map:  74%|███████▍  | 8585/11615 [00:24<00:09, 328.87 examples/s]
Map:  74%|███████▎  | 8562/11615 [00:24<00:09, 330.04 examples/s]
Map:  73%|███████▎  | 8524/11615 [00:24<00:09, 338.30 examples/s]
Map:  74%|███████▎  | 8565/11615 [00:24<00:09, 331.28 examples/s]
Map:  74%|███████▍  | 8608/11615 [00:24<00:09, 332.03 examples/s]
Map:  73%|███████▎  | 8524/11615 [00:24<00:09, 339.12 examples/s]
Map:  73%|███████▎  | 8512/11615 [00:24<00:09, 342.57 examples/s]
Map:  74%|███████▍  | 8620/11615 [00:24<00:09, 330.31 examples/s]
Map:  74%|███████▍  | 8596/11615 [00:24<00:09, 331.01 examples/s]
Map:  74%|███████▍  | 8599/11615 [00:24<00:09, 331.32 examples/s]
Map:  74%|███████▍  | 8570/11615 [00:24<00:09, 324.73 examples/s]
Map:  75%|███████▍  | 8658/11615 [00:24<00:08, 330.86 examples/s]
Map:  75%|███████▍  | 8654/11615 [00:24<00:08, 329.84 examples/s]
Map:  74%|███████▎  | 8547/11615 [00:24<00:09, 336.66 examples/s]
Map:  74%|███████▍  | 8630/11615 [00:24<00:09, 329.21 examples/s]
Map:  74%|███████▍  | 8571/11615 [00:24<00:09, 325.61 examples/s]
Map:  74%|███████▍  | 8633/11615 [00:24<00:09, 330.35 examples/s]
Map:  74%|███████▍  | 8605/11615 [00:24<00:09, 329.19 examples/s]
Map:  75%|███████▍  | 8692/11615 [00:24<00:08, 331.59 examples/s]
Map:  75%|███████▍  | 8690/11615 [00:24<00:08, 332.04 examples/s]
Map:  74%|███████▍  | 8606/11615 [00:24<00:09, 329.77 examples/s]
Map:  75%|███████▍  | 8667/11615 [00:24<00:08, 329.37 examples/s]
Map:  74%|███████▍  | 8594/11615 [00:24<00:09, 324.97 examples/s]
Map:  75%|███████▍  | 8680/11615 [00:24<00:08, 327.83 examples/s]
Map:  74%|███████▍  | 8639/11615 [00:24<00:09, 329.96 examples/s]
Map:  75%|███████▌  | 8734/11615 [00:24<00:08, 352.98 examples/s]
Map:  75%|███████▌  | 8732/11615 [00:24<00:08, 354.33 examples/s]
Map:  75%|███████▍  | 8704/11615 [00:24<00:08, 338.57 examples/s]
Map:  74%|███████▍  | 8640/11615 [00:24<00:09, 325.95 examples/s]
Map:  74%|███████▍  | 8630/11615 [00:24<00:09, 326.81 examples/s]
Map:  75%|███████▌  | 8722/11615 [00:24<00:08, 346.00 examples/s]
Map:  76%|███████▌  | 8770/11615 [00:24<00:08, 348.05 examples/s]
Map:  75%|███████▌  | 8769/11615 [00:24<00:08, 354.26 examples/s]
Map:  75%|███████▍  | 8689/11615 [00:24<00:08, 329.75 examples/s]
Map:  75%|███████▌  | 8746/11615 [00:24<00:08, 358.49 examples/s]
Map:  75%|███████▍  | 8673/11615 [00:24<00:09, 324.57 examples/s]
Map:  75%|███████▍  | 8663/11615 [00:24<00:09, 325.67 examples/s]
Map:  75%|███████▌  | 8760/11615 [00:24<00:08, 352.84 examples/s]
Map:  76%|███████▌  | 8809/11615 [00:24<00:07, 354.95 examples/s]
Map:  76%|███████▌  | 8806/11615 [00:24<00:07, 357.18 examples/s]
Map:  75%|███████▌  | 8730/11615 [00:24<00:08, 346.18 examples/s]
Map:  75%|███████▌  | 8712/11615 [00:24<00:08, 339.99 examples/s]
Map:  75%|███████▍  | 8700/11615 [00:24<00:08, 333.05 examples/s]
Map:  76%|███████▌  | 8796/11615 [00:24<00:08, 350.94 examples/s]
Map:  76%|███████▌  | 8799/11615 [00:24<00:07, 354.36 examples/s]
Map:  76%|███████▌  | 8848/11615 [00:24<00:07, 361.00 examples/s]
Map:  76%|███████▌  | 8842/11615 [00:24<00:07, 355.08 examples/s]
Map:  75%|███████▌  | 8767/11615 [00:24<00:08, 347.68 examples/s]
Map:  75%|███████▌  | 8752/11615 [00:24<00:08, 354.72 examples/s]
Map:  75%|███████▌  | 8743/11615 [00:24<00:08, 356.67 examples/s]
Map:  76%|███████▌  | 8833/11615 [00:25<00:07, 353.82 examples/s]
Map:  76%|███████▌  | 8836/11615 [00:25<00:07, 354.46 examples/s]
Map:  77%|███████▋  | 8887/11615 [00:25<00:07, 367.10 examples/s]
Map:  76%|███████▋  | 8882/11615 [00:25<00:07, 366.30 examples/s]
Map:  76%|███████▌  | 8804/11615 [00:25<00:08, 349.66 examples/s]
Map:  76%|███████▋  | 8872/11615 [00:25<00:07, 362.69 examples/s]
Map:  76%|███████▋  | 8875/11615 [00:25<00:07, 362.47 examples/s]
Map:  76%|███████▌  | 8806/11615 [00:25<00:07, 352.66 examples/s]
Map:  76%|███████▌  | 8796/11615 [00:25<00:08, 350.18 examples/s]
Map:  76%|███████▌  | 8840/11615 [00:25<00:07, 348.22 examples/s]
Map:  77%|███████▋  | 8939/11615 [00:25<00:07, 353.45 examples/s]
Map:  77%|███████▋  | 8935/11615 [00:25<00:07, 355.96 examples/s]
Map:  76%|███████▌  | 8833/11615 [00:25<00:07, 352.39 examples/s]
Map:  77%|███████▋  | 8922/11615 [00:25<00:07, 350.05 examples/s]
Map:  76%|███████▋  | 8881/11615 [00:25<00:07, 361.74 examples/s]
Map:  76%|███████▋  | 8861/11615 [00:25<00:07, 354.24 examples/s]
Map:  77%|███████▋  | 8928/11615 [00:25<00:07, 354.06 examples/s]
Map:  77%|███████▋  | 8991/11615 [00:25<00:07, 348.35 examples/s]
Map:  77%|███████▋  | 8989/11615 [00:25<00:07, 354.62 examples/s]
Map:  76%|███████▋  | 8872/11615 [00:25<00:07, 360.52 examples/s]
Map:  77%|███████▋  | 8960/11615 [00:25<00:07, 351.36 examples/s]
Map:  77%|███████▋  | 8964/11615 [00:25<00:07, 352.46 examples/s]
Map:  77%|███████▋  | 8901/11615 [00:25<00:07, 360.38 examples/s]
Map:  77%|███████▋  | 8932/11615 [00:25<00:07, 350.05 examples/s]
Map:  78%|███████▊  | 9027/11615 [00:25<00:07, 347.39 examples/s]
Map:  77%|███████▋  | 8997/11615 [00:25<00:07, 349.23 examples/s]
Map:  78%|███████▊  | 9039/11615 [00:25<00:07, 346.05 examples/s]
Map:  77%|███████▋  | 8922/11615 [00:25<00:07, 348.29 examples/s]
Map:  77%|███████▋  | 8970/11615 [00:25<00:07, 348.95 examples/s]
Map:  78%|███████▊  | 9017/11615 [00:25<00:07, 349.10 examples/s]
Map:  77%|███████▋  | 8952/11615 [00:25<00:07, 349.02 examples/s]
Map:  78%|███████▊  | 9081/11615 [00:25<00:07, 346.76 examples/s]
Map:  78%|███████▊  | 9050/11615 [00:25<00:07, 346.45 examples/s]
Map:  77%|███████▋  | 8989/11615 [00:25<00:07, 349.18 examples/s]
Map:  78%|███████▊  | 9092/11615 [00:25<00:07, 346.24 examples/s]
Map:  77%|███████▋  | 8973/11615 [00:25<00:07, 340.52 examples/s]
Map:  78%|███████▊  | 9022/11615 [00:25<00:07, 343.87 examples/s]
Map:  78%|███████▊  | 9062/11615 [00:25<00:07, 326.67 examples/s]
Map:  79%|███████▊  | 9120/11615 [00:25<00:07, 352.50 examples/s]
Map:  78%|███████▊  | 9086/11615 [00:25<00:07, 347.30 examples/s]
Map:  79%|███████▊  | 9130/11615 [00:25<00:07, 352.73 examples/s]
Map:  78%|███████▊  | 9009/11615 [00:25<00:07, 341.15 examples/s]
Map:  78%|███████▊  | 9057/11615 [00:25<00:07, 342.44 examples/s]
Map:  78%|███████▊  | 9100/11615 [00:25<00:07, 338.09 examples/s]
Map:  78%|███████▊  | 9039/11615 [00:25<00:07, 341.01 examples/s]
Map:  79%|███████▉  | 9158/11615 [00:25<00:06, 356.82 examples/s]
Map:  79%|███████▊  | 9123/11615 [00:25<00:07, 351.37 examples/s]
Map:  79%|███████▉  | 9167/11615 [00:25<00:06, 352.34 examples/s]
Map:  78%|███████▊  | 9095/11615 [00:25<00:07, 347.57 examples/s]
Map:  79%|███████▊  | 9140/11615 [00:25<00:07, 350.14 examples/s]
Map:  78%|███████▊  | 9075/11615 [00:25<00:07, 341.35 examples/s]
Map:  79%|███████▉  | 9194/11615 [00:25<00:06, 354.04 examples/s]
Map:  78%|███████▊  | 9060/11615 [00:25<00:07, 335.42 examples/s]
Map:  79%|███████▉  | 9160/11615 [00:25<00:06, 353.39 examples/s]
Map:  79%|███████▉  | 9204/11615 [00:25<00:06, 354.23 examples/s]
Map:  79%|███████▊  | 9133/11615 [00:26<00:07, 354.10 examples/s]
Map:  78%|███████▊  | 9113/11615 [00:26<00:07, 348.53 examples/s]
Map:  80%|███████▉  | 9236/11615 [00:26<00:06, 369.90 examples/s]
Map:  78%|███████▊  | 9099/11615 [00:26<00:07, 345.04 examples/s]
Map:  79%|███████▉  | 9198/11615 [00:26<00:06, 355.94 examples/s]
Map:  79%|███████▉  | 9194/11615 [00:26<00:06, 347.38 examples/s]
Map:  80%|███████▉  | 9245/11615 [00:26<00:06, 363.64 examples/s]
Map:  79%|███████▉  | 9151/11615 [00:26<00:06, 354.74 examples/s]
Map:  79%|███████▊  | 9138/11615 [00:26<00:06, 355.00 examples/s]
Map:  80%|███████▉  | 9240/11615 [00:26<00:06, 368.58 examples/s]
Map:  79%|███████▉  | 9186/11615 [00:26<00:06, 350.37 examples/s]
Map:  80%|███████▉  | 9236/11615 [00:26<00:06, 362.93 examples/s]
Map:  80%|███████▉  | 9290/11615 [00:26<00:06, 361.63 examples/s]
Map:  80%|████████  | 9300/11615 [00:26<00:06, 358.31 examples/s]
Map:  79%|███████▉  | 9226/11615 [00:26<00:06, 361.57 examples/s]
Map:  79%|███████▉  | 9206/11615 [00:26<00:06, 355.65 examples/s]
Map:  80%|████████  | 9327/11615 [00:26<00:06, 358.50 examples/s]
Map:  79%|███████▉  | 9191/11615 [00:26<00:06, 348.69 examples/s]
Map:  80%|████████  | 9295/11615 [00:26<00:06, 360.75 examples/s]
Map:  80%|███████▉  | 9290/11615 [00:26<00:06, 357.56 examples/s]
Map:  80%|███████▉  | 9245/11615 [00:26<00:06, 362.92 examples/s]
Map:  81%|████████  | 9351/11615 [00:26<00:06, 349.52 examples/s]
Map:  79%|███████▉  | 9232/11615 [00:26<00:06, 361.89 examples/s]
Map:  80%|███████▉  | 9281/11615 [00:26<00:06, 357.80 examples/s]
Map:  80%|████████  | 9327/11615 [00:26<00:06, 355.54 examples/s]
Map:  81%|████████  | 9381/11615 [00:26<00:06, 353.95 examples/s]
Map:  80%|████████  | 9349/11615 [00:26<00:06, 351.86 examples/s]
Map:  81%|████████  | 9390/11615 [00:26<00:06, 355.16 examples/s]
Map:  80%|████████  | 9319/11615 [00:26<00:06, 357.34 examples/s]
Map:  80%|████████  | 9300/11615 [00:26<00:06, 357.25 examples/s]
Map:  81%|████████  | 9419/11615 [00:26<00:06, 354.49 examples/s]
Map:  80%|███████▉  | 9286/11615 [00:26<00:06, 357.26 examples/s]
Map:  81%|████████  | 9387/11615 [00:26<00:06, 355.43 examples/s]
Map:  81%|████████  | 9380/11615 [00:26<00:06, 351.80 examples/s]
Map:  81%|████████  | 9426/11615 [00:26<00:06, 355.53 examples/s]
Map:  81%|████████▏ | 9456/11615 [00:26<00:06, 354.05 examples/s]
Map:  81%|████████  | 9372/11615 [00:26<00:06, 352.18 examples/s]
Map:  81%|████████  | 9351/11615 [00:26<00:06, 348.27 examples/s]
Map:  81%|████████▏ | 9464/11615 [00:26<00:05, 359.55 examples/s]
Map:  80%|████████  | 9336/11615 [00:26<00:06, 347.72 examples/s]
Map:  81%|████████▏ | 9442/11615 [00:26<00:06, 356.26 examples/s]
Map:  81%|████████  | 9434/11615 [00:26<00:06, 352.92 examples/s]
Map:  82%|████████▏ | 9493/11615 [00:26<00:05, 354.71 examples/s]
Map:  81%|████████  | 9409/11615 [00:26<00:06, 352.24 examples/s]
Map:  81%|████████  | 9390/11615 [00:26<00:06, 353.32 examples/s]
Map:  81%|████████  | 9374/11615 [00:26<00:06, 352.99 examples/s]
Map:  82%|████████▏ | 9480/11615 [00:26<00:05, 357.00 examples/s]
Map:  82%|████████▏ | 9472/11615 [00:26<00:06, 356.84 examples/s]
Map:  82%|████████▏ | 9522/11615 [00:26<00:05, 362.08 examples/s]
Map:  82%|████████▏ | 9532/11615 [00:26<00:05, 362.43 examples/s]
Map:  81%|████████▏ | 9445/11615 [00:26<00:06, 352.95 examples/s]
Map:  81%|████████  | 9426/11615 [00:26<00:06, 353.14 examples/s]
Map:  81%|████████  | 9410/11615 [00:26<00:06, 349.71 examples/s]
Map:  82%|████████▏ | 9518/11615 [00:26<00:05, 361.13 examples/s]
Map:  82%|████████▏ | 9510/11615 [00:26<00:05, 354.95 examples/s]
Map:  82%|████████▏ | 9482/11615 [00:26<00:06, 353.27 examples/s]
Map:  81%|████████▏ | 9464/11615 [00:26<00:06, 356.01 examples/s]
Map:  82%|████████▏ | 9575/11615 [00:27<00:05, 350.30 examples/s]
Map:  83%|████████▎ | 9583/11615 [00:27<00:05, 347.15 examples/s]
Map:  81%|████████▏ | 9450/11615 [00:26<00:06, 351.63 examples/s]
Map:  82%|████████▏ | 9547/11615 [00:27<00:05, 355.61 examples/s]
Map:  82%|████████▏ | 9568/11615 [00:27<00:05, 348.21 examples/s]
Map:  82%|████████▏ | 9521/11615 [00:27<00:05, 356.75 examples/s]
Map:  82%|████████▏ | 9500/11615 [00:27<00:06, 349.11 examples/s]
Map:  83%|████████▎ | 9619/11615 [00:27<00:05, 349.33 examples/s]
Map:  82%|████████▏ | 9488/11615 [00:27<00:05, 357.17 examples/s]
Map:  83%|████████▎ | 9631/11615 [00:27<00:05, 352.44 examples/s]
Map:  83%|████████▎ | 9599/11615 [00:27<00:05, 348.14 examples/s]
Map:  82%|████████▏ | 9539/11615 [00:27<00:05, 359.26 examples/s]
Map:  82%|████████▏ | 9526/11615 [00:27<00:05, 360.63 examples/s]
Map:  83%|████████▎ | 9620/11615 [00:27<00:05, 345.05 examples/s]
Map:  82%|████████▏ | 9570/11615 [00:27<00:05, 343.59 examples/s]
Map:  83%|████████▎ | 9673/11615 [00:27<00:05, 351.25 examples/s]
Map:  83%|████████▎ | 9667/11615 [00:27<00:05, 351.46 examples/s]
Map:  83%|████████▎ | 9635/11615 [00:27<00:05, 346.14 examples/s]
Map:  83%|████████▎ | 9659/11615 [00:27<00:05, 349.67 examples/s]
Map:  83%|████████▎ | 9607/11615 [00:27<00:05, 345.50 examples/s]
Map:  83%|████████▎ | 9589/11615 [00:27<00:05, 345.81 examples/s]
Map:  84%|████████▎ | 9713/11615 [00:27<00:05, 360.19 examples/s]
Map:  84%|████████▎ | 9707/11615 [00:27<00:05, 359.31 examples/s]
Map:  82%|████████▏ | 9576/11615 [00:27<00:05, 344.92 examples/s]
Map:  83%|████████▎ | 9672/11615 [00:27<00:05, 350.42 examples/s]
Map:  83%|████████▎ | 9697/11615 [00:27<00:05, 353.08 examples/s]
Map:  83%|████████▎ | 9625/11615 [00:27<00:05, 347.04 examples/s]
Map:  84%|████████▍ | 9744/11615 [00:27<00:05, 357.95 examples/s]
Map:  84%|████████▍ | 9750/11615 [00:27<00:05, 355.54 examples/s]
Map:  83%|████████▎ | 9660/11615 [00:27<00:05, 343.38 examples/s]
Map:  84%|████████▎ | 9713/11615 [00:27<00:05, 361.02 examples/s]
Map:  83%|████████▎ | 9631/11615 [00:27<00:05, 348.40 examples/s]
Map:  84%|████████▍ | 9737/11615 [00:27<00:05, 362.24 examples/s]
Map:  83%|████████▎ | 9660/11615 [00:27<00:05, 341.94 examples/s]
Map:  84%|████████▍ | 9787/11615 [00:27<00:05, 355.95 examples/s]
Map:  84%|████████▍ | 9780/11615 [00:27<00:05, 353.07 examples/s]
Map:  84%|████████▎ | 9700/11615 [00:27<00:05, 351.61 examples/s]
Map:  84%|████████▍ | 9750/11615 [00:27<00:05, 355.75 examples/s]
Map:  84%|████████▎ | 9700/11615 [00:27<00:05, 350.97 examples/s]
Map:  85%|████████▍ | 9823/11615 [00:27<00:05, 355.45 examples/s]
Map:  85%|████████▍ | 9818/11615 [00:27<00:05, 358.67 examples/s]
Map:  83%|████████▎ | 9683/11615 [00:27<00:05, 345.84 examples/s]
Map:  84%|████████▍ | 9790/11615 [00:27<00:05, 349.54 examples/s]
Map:  84%|████████▍ | 9739/11615 [00:27<00:05, 357.57 examples/s]
Map:  84%|████████▍ | 9787/11615 [00:27<00:05, 355.54 examples/s]
Map:  84%|████████▍ | 9739/11615 [00:27<00:05, 360.81 examples/s]
Map:  85%|████████▍ | 9857/11615 [00:27<00:04, 366.03 examples/s]
Map:  85%|████████▍ | 9863/11615 [00:27<00:04, 364.24 examples/s]
Map:  84%|████████▎ | 9724/11615 [00:27<00:05, 358.68 examples/s]
Map:  85%|████████▍ | 9829/11615 [00:27<00:05, 356.75 examples/s]
Map:  85%|████████▍ | 9823/11615 [00:27<00:05, 354.96 examples/s]
Map:  84%|████████▍ | 9790/11615 [00:27<00:05, 343.90 examples/s]
Map:  85%|████████▌ | 9904/11615 [00:27<00:04, 373.62 examples/s]
Map:  85%|████████▌ | 9897/11615 [00:27<00:04, 370.97 examples/s]
Map:  85%|████████▍ | 9870/11615 [00:27<00:04, 364.28 examples/s]
Map:  84%|████████▍ | 9790/11615 [00:27<00:05, 346.26 examples/s]
Map:  85%|████████▍ | 9863/11615 [00:27<00:04, 363.86 examples/s]
Map:  84%|████████▍ | 9778/11615 [00:27<00:05, 352.88 examples/s]
Map:  85%|████████▍ | 9828/11615 [00:27<00:05, 351.04 examples/s]
Map:  85%|████████▌ | 9911/11615 [00:28<00:04, 372.47 examples/s]
Map:  85%|████████▍ | 9828/11615 [00:28<00:05, 352.88 examples/s]
Map:  85%|████████▌ | 9904/11615 [00:28<00:04, 372.95 examples/s]
Map:  86%|████████▌ | 9955/11615 [00:28<00:04, 359.00 examples/s]
Map:  86%|████████▌ | 9950/11615 [00:28<00:04, 360.65 examples/s]
Map:  84%|████████▍ | 9814/11615 [00:28<00:05, 352.38 examples/s]
Map:  85%|████████▍ | 9869/11615 [00:28<00:04, 363.00 examples/s]
Map:  85%|████████▍ | 9869/11615 [00:28<00:04, 364.79 examples/s]
Map:  86%|████████▌ | 9992/11615 [00:28<00:04, 359.07 examples/s]
Map:  86%|████████▌ | 9988/11615 [00:28<00:04, 363.33 examples/s]
Map:  85%|████████▍ | 9850/11615 [00:28<00:05, 352.84 examples/s]
Map:  86%|████████▌ | 9963/11615 [00:28<00:04, 358.42 examples/s]
Map:  85%|████████▌ | 9909/11615 [00:28<00:04, 369.28 examples/s]
Map:  86%|████████▌ | 9955/11615 [00:28<00:04, 358.18 examples/s]
Map:  85%|████████▌ | 9909/11615 [00:28<00:04, 371.00 examples/s]
Map:  86%|████████▋ | 10030/11615 [00:28<00:04, 358.15 examples/s]
Map:  85%|████████▌ | 9890/11615 [00:28<00:04, 362.41 examples/s]
Map:  86%|████████▌ | 10000/11615 [00:28<00:04, 356.01 examples/s]
Map:  86%|████████▌ | 9992/11615 [00:28<00:04, 358.42 examples/s]
Map:  86%|████████▋ | 10041/11615 [00:28<00:04, 357.85 examples/s]
Map:  86%|████████▌ | 9960/11615 [00:28<00:04, 350.00 examples/s]
Map:  87%|████████▋ | 10070/11615 [00:28<00:04, 361.34 examples/s]
Map:  86%|████████▋ | 10030/11615 [00:28<00:04, 357.36 examples/s]
Map:  86%|████████▋ | 10040/11615 [00:28<00:04, 359.58 examples/s]
Map:  86%|████████▌ | 9960/11615 [00:28<00:04, 352.16 examples/s]
Map:  87%|████████▋ | 10080/11615 [00:28<00:04, 361.41 examples/s]
Map:  86%|████████▌ | 9943/11615 [00:28<00:04, 357.36 examples/s]
Map:  86%|████████▌ | 9999/11615 [00:28<00:04, 355.52 examples/s]
Map:  87%|████████▋ | 10109/11615 [00:28<00:04, 362.79 examples/s]
Map:  86%|████████▌ | 9999/11615 [00:28<00:04, 357.42 examples/s]
Map:  87%|████████▋ | 10117/11615 [00:28<00:04, 359.61 examples/s]
Map:  87%|████████▋ | 10070/11615 [00:28<00:04, 360.52 examples/s]
Map:  87%|████████▋ | 10080/11615 [00:28<00:04, 362.01 examples/s]
Map:  86%|████████▌ | 9980/11615 [00:28<00:04, 355.68 examples/s]
Map:  87%|████████▋ | 10054/11615 [00:28<00:04, 354.79 examples/s]
Map:  87%|████████▋ | 10117/11615 [00:28<00:04, 359.53 examples/s]
Map:  87%|████████▋ | 10109/11615 [00:28<00:04, 361.92 examples/s]
Map:  87%|████████▋ | 10161/11615 [00:28<00:04, 354.27 examples/s]
Map:  87%|████████▋ | 10054/11615 [00:28<00:04, 356.16 examples/s]
Map:  88%|████████▊ | 10170/11615 [00:28<00:04, 353.35 examples/s]
Map:  86%|████████▋ | 10033/11615 [00:28<00:04, 353.85 examples/s]
Map:  87%|████████▋ | 10092/11615 [00:28<00:04, 358.38 examples/s]
Map:  87%|████████▋ | 10092/11615 [00:28<00:04, 358.90 examples/s]
Map:  87%|████████▋ | 10161/11615 [00:28<00:04, 353.41 examples/s]
Map:  88%|████████▊ | 10170/11615 [00:28<00:04, 352.87 examples/s]
Map:  87%|████████▋ | 10071/11615 [00:28<00:04, 356.77 examples/s]
Map:  88%|████████▊ | 10212/11615 [00:28<00:04, 346.97 examples/s]
Map:  88%|████████▊ | 10224/11615 [00:28<00:03, 352.22 examples/s]
Map:  87%|████████▋ | 10143/11615 [00:28<00:04, 347.26 examples/s]
Map:  88%|████████▊ | 10250/11615 [00:28<00:03, 352.02 examples/s]
Map:  87%|████████▋ | 10109/11615 [00:28<00:04, 357.29 examples/s]
Map:  88%|████████▊ | 10263/11615 [00:28<00:03, 359.49 examples/s]
Map:  88%|████████▊ | 10211/11615 [00:28<00:04, 345.29 examples/s]
Map:  87%|████████▋ | 10143/11615 [00:28<00:04, 346.20 examples/s]
Map:  88%|████████▊ | 10224/11615 [00:28<00:03, 350.85 examples/s]
Map:  88%|████████▊ | 10179/11615 [00:28<00:04, 345.66 examples/s]
Map:  89%|████████▊ | 10290/11615 [00:28<00:03, 357.59 examples/s]
Map:  88%|████████▊ | 10262/11615 [00:29<00:03, 357.25 examples/s]
Map:  88%|████████▊ | 10250/11615 [00:29<00:03, 351.13 examples/s]
Map:  89%|████████▊ | 10300/11615 [00:29<00:03, 355.85 examples/s]
Map:  87%|████████▋ | 10161/11615 [00:29<00:04, 349.54 examples/s]
Map:  88%|████████▊ | 10196/11615 [00:29<00:04, 345.06 examples/s]
Map:  88%|████████▊ | 10215/11615 [00:29<00:04, 344.60 examples/s]
Map:  89%|████████▉ | 10331/11615 [00:29<00:03, 368.03 examples/s]
Map:  89%|████████▊ | 10299/11615 [00:29<00:03, 358.66 examples/s]
Map:  89%|████████▉ | 10340/11615 [00:29<00:03, 364.05 examples/s]
Map:  89%|████████▊ | 10290/11615 [00:29<00:03, 356.40 examples/s]
Map:  88%|████████▊ | 10253/11615 [00:29<00:03, 351.76 examples/s]
Map:  88%|████████▊ | 10234/11615 [00:29<00:03, 348.79 examples/s]
Map:  88%|████████▊ | 10211/11615 [00:29<00:04, 341.76 examples/s]
Map:  89%|████████▉ | 10370/11615 [00:29<00:03, 362.91 examples/s]
Map:  89%|████████▉ | 10339/11615 [00:29<00:03, 366.64 examples/s]
Map:  89%|████████▉ | 10379/11615 [00:29<00:03, 364.89 examples/s]
Map:  89%|████████▉ | 10331/11615 [00:29<00:03, 366.98 examples/s]
Map:  89%|████████▊ | 10290/11615 [00:29<00:03, 354.77 examples/s]
Map:  88%|████████▊ | 10272/11615 [00:29<00:03, 354.45 examples/s]
Map:  88%|████████▊ | 10250/11615 [00:29<00:03, 347.34 examples/s]
Map:  90%|████████▉ | 10411/11615 [00:29<00:03, 367.04 examples/s]
Map:  89%|████████▉ | 10376/11615 [00:29<00:03, 363.85 examples/s]
Map:  90%|████████▉ | 10419/11615 [00:29<00:03, 371.58 examples/s]
Map:  89%|████████▉ | 10370/11615 [00:29<00:03, 362.10 examples/s]
Map:  89%|████████▉ | 10330/11615 [00:29<00:03, 363.51 examples/s]
Map:  89%|████████▉ | 10310/11615 [00:29<00:03, 353.81 examples/s]
Map:  89%|████████▊ | 10289/11615 [00:29<00:03, 355.77 examples/s]
Map:  90%|████████▉ | 10449/11615 [00:29<00:03, 369.62 examples/s]
Map:  90%|████████▉ | 10414/11615 [00:29<00:03, 366.70 examples/s]
Map:  90%|████████▉ | 10410/11615 [00:29<00:03, 366.66 examples/s]
Map:  89%|████████▉ | 10368/11615 [00:29<00:03, 364.32 examples/s]
Map:  90%|█████████ | 10473/11615 [00:29<00:03, 364.61 examples/s]
Map:  89%|████████▉ | 10349/11615 [00:29<00:03, 362.16 examples/s]
Map:  89%|████████▉ | 10328/11615 [00:29<00:03, 364.32 examples/s]
Map:  90%|████████▉ | 10449/11615 [00:29<00:03, 368.31 examples/s]
Map:  90%|█████████ | 10502/11615 [00:29<00:03, 360.12 examples/s]
Map:  90%|█████████ | 10470/11615 [00:29<00:03, 361.05 examples/s]
Map:  90%|█████████ | 10510/11615 [00:29<00:03, 363.08 examples/s]
Map:  90%|████████▉ | 10406/11615 [00:29<00:03, 363.17 examples/s]
Map:  90%|████████▉ | 10403/11615 [00:29<00:03, 358.74 examples/s]
Map:  89%|████████▉ | 10381/11615 [00:29<00:03, 355.28 examples/s]
Map:  91%|█████████ | 10540/11615 [00:29<00:03, 357.43 examples/s]
Map:  90%|████████▉ | 10445/11615 [00:29<00:03, 368.89 examples/s]
Map:  90%|█████████ | 10509/11615 [00:29<00:03, 365.71 examples/s]
Map:  90%|█████████ | 10502/11615 [00:29<00:03, 359.23 examples/s]
Map:  90%|████████▉ | 10442/11615 [00:29<00:03, 365.09 examples/s]
Map:  91%|█████████ | 10563/11615 [00:29<00:02, 354.23 examples/s]
Map:  90%|████████▉ | 10421/11615 [00:29<00:03, 363.60 examples/s]
Map:  91%|█████████ | 10577/11615 [00:29<00:02, 354.85 examples/s]
Map:  91%|█████████ | 10540/11615 [00:29<00:03, 356.77 examples/s]
Map:  90%|█████████ | 10497/11615 [00:29<00:03, 358.79 examples/s]
Map:  91%|█████████ | 10562/11615 [00:29<00:02, 355.94 examples/s]
Map:  91%|█████████▏| 10603/11615 [00:29<00:02, 363.11 examples/s]
Map:  90%|█████████ | 10459/11615 [00:29<00:03, 365.34 examples/s]
Map:  91%|█████████▏| 10616/11615 [00:29<00:02, 360.66 examples/s]
Map:  90%|█████████ | 10493/11615 [00:29<00:03, 354.54 examples/s]
Map:  91%|█████████ | 10577/11615 [00:29<00:02, 354.33 examples/s]
Map:  92%|█████████▏| 10641/11615 [00:29<00:02, 363.48 examples/s]
Map:  91%|█████████▏| 10602/11615 [00:29<00:02, 360.52 examples/s]
Map:  91%|█████████ | 10550/11615 [00:30<00:03, 352.91 examples/s]
Map:  91%|█████████ | 10529/11615 [00:30<00:03, 352.13 examples/s]
Map:  90%|█████████ | 10511/11615 [00:29<00:03, 355.83 examples/s]
Map:  91%|█████████▏| 10616/11615 [00:30<00:02, 360.08 examples/s]
Map:  92%|█████████▏| 10671/11615 [00:30<00:02, 358.01 examples/s]
Map:  92%|█████████▏| 10640/11615 [00:30<00:02, 361.65 examples/s]
Map:  92%|█████████▏| 10697/11615 [00:30<00:02, 363.12 examples/s]
Map:  91%|█████████ | 10548/11615 [00:30<00:03, 355.00 examples/s]
Map:  92%|█████████▏| 10709/11615 [00:30<00:02, 362.17 examples/s]
Map:  91%|█████████▏| 10605/11615 [00:30<00:02, 355.65 examples/s]
Map:  91%|█████████ | 10581/11615 [00:30<00:02, 346.35 examples/s]
Map:  92%|█████████▏| 10671/11615 [00:30<00:02, 357.75 examples/s]
Map:  92%|█████████▏| 10696/11615 [00:30<00:02, 360.77 examples/s]
Map:  92%|█████████▏| 10643/11615 [00:30<00:02, 357.03 examples/s]
Map:  91%|█████████▏| 10621/11615 [00:30<00:02, 355.26 examples/s]
Map:  93%|█████████▎| 10749/11615 [00:30<00:02, 354.60 examples/s]
Map:  91%|█████████▏| 10604/11615 [00:30<00:02, 356.21 examples/s]
Map:  93%|█████████▎| 10760/11615 [00:30<00:02, 352.62 examples/s]
Map:  92%|█████████▏| 10709/11615 [00:30<00:02, 361.83 examples/s]
Map:  92%|█████████▏| 10679/11615 [00:30<00:02, 355.19 examples/s]
Map:  92%|█████████▏| 10659/11615 [00:30<00:02, 355.86 examples/s]
Map:  92%|█████████▏| 10641/11615 [00:30<00:02, 357.49 examples/s]
Map:  93%|█████████▎| 10748/11615 [00:30<00:02, 351.54 examples/s]
Map:  93%|█████████▎| 10800/11615 [00:30<00:02, 343.91 examples/s]
Map:  93%|█████████▎| 10810/11615 [00:30<00:02, 342.51 examples/s]
Map:  93%|█████████▎| 10762/11615 [00:30<00:02, 353.69 examples/s]
Map:  92%|█████████▏| 10697/11615 [00:30<00:02, 358.79 examples/s]
Map:  92%|█████████▏| 10731/11615 [00:30<00:02, 348.95 examples/s]
Map:  93%|█████████▎| 10839/11615 [00:30<00:02, 352.53 examples/s]
Map:  92%|█████████▏| 10697/11615 [00:30<00:02, 357.46 examples/s]
Map:  93%|█████████▎| 10800/11615 [00:30<00:02, 342.38 examples/s]
Map:  93%|█████████▎| 10850/11615 [00:30<00:02, 349.56 examples/s]
Map:  93%|█████████▎| 10810/11615 [00:30<00:02, 324.34 examples/s]
Map:  92%|█████████▏| 10741/11615 [00:30<00:02, 300.60 examples/s]
Map:  94%|█████████▎| 10877/11615 [00:30<00:02, 316.90 examples/s]
Map:  93%|█████████▎| 10841/11615 [00:30<00:02, 320.28 examples/s]
Map:  94%|█████████▎| 10889/11615 [00:30<00:02, 305.17 examples/s]
Map:  93%|█████████▎| 10778/11615 [00:30<00:02, 299.46 examples/s]
Map:  92%|█████████▏| 10740/11615 [00:30<00:02, 302.71 examples/s]
Map:  93%|█████████▎| 10776/11615 [00:30<00:02, 307.38 examples/s]
Map:  93%|█████████▎| 10851/11615 [00:30<00:02, 302.34 examples/s]
Map:  94%|█████████▍| 10911/11615 [00:30<00:02, 320.04 examples/s]
Map:  94%|█████████▎| 10879/11615 [00:30<00:02, 331.92 examples/s]
Map:  93%|█████████▎| 10812/11615 [00:30<00:02, 306.79 examples/s]
Map:  94%|█████████▍| 10924/11615 [00:30<00:02, 311.97 examples/s]
Map:  93%|█████████▎| 10776/11615 [00:30<00:02, 310.10 examples/s]
Map:  93%|█████████▎| 10810/11615 [00:30<00:02, 312.18 examples/s]
Map:  94%|█████████▎| 10889/11615 [00:30<00:02, 316.02 examples/s]
Map:  94%|█████████▍| 10948/11615 [00:30<00:02, 330.20 examples/s]
Map:  94%|█████████▍| 10914/11615 [00:30<00:02, 332.20 examples/s]
Map:  93%|█████████▎| 10850/11615 [00:30<00:02, 320.97 examples/s]
Map:  94%|█████████▍| 10960/11615 [00:30<00:02, 320.22 examples/s]
Map:  93%|█████████▎| 10810/11615 [00:30<00:02, 314.29 examples/s]
Map:  94%|█████████▍| 10924/11615 [00:30<00:02, 320.51 examples/s]
Map:  95%|█████████▍| 10985/11615 [00:30<00:01, 337.81 examples/s]
Map:  93%|█████████▎| 10850/11615 [00:30<00:02, 327.57 examples/s]
Map:  94%|█████████▍| 10950/11615 [00:31<00:01, 333.62 examples/s]
Map:  94%|█████████▎| 10886/11615 [00:31<00:02, 328.65 examples/s]
Map:  95%|█████████▍| 10999/11615 [00:31<00:01, 332.36 examples/s]
Map:  93%|█████████▎| 10849/11615 [00:31<00:02, 331.28 examples/s]
Map:  95%|█████████▍| 11020/11615 [00:31<00:01, 337.91 examples/s]
Map:  94%|█████████▎| 10886/11615 [00:31<00:02, 333.81 examples/s]
Map:  94%|█████████▍| 10960/11615 [00:31<00:02, 326.53 examples/s]
Map:  95%|█████████▍| 10988/11615 [00:31<00:01, 342.44 examples/s]
Map:  94%|█████████▍| 10920/11615 [00:31<00:02, 328.56 examples/s]
Map:  95%|█████████▌| 11037/11615 [00:31<00:01, 342.52 examples/s]
Map:  94%|█████████▍| 10900/11615 [00:31<00:02, 331.59 examples/s]
Map:  95%|█████████▌| 11061/11615 [00:31<00:01, 353.84 examples/s]
Map:  95%|█████████▍| 10999/11615 [00:31<00:01, 337.07 examples/s]
Map:  95%|█████████▍| 11024/11615 [00:31<00:01, 344.51 examples/s]
Map:  94%|█████████▍| 10956/11615 [00:31<00:01, 335.31 examples/s]
Map:  95%|█████████▌| 11075/11615 [00:31<00:01, 350.05 examples/s]
Map:  94%|█████████▍| 10938/11615 [00:31<00:02, 335.10 examples/s]
Map:  94%|█████████▍| 10936/11615 [00:31<00:02, 334.70 examples/s]
Map:  96%|█████████▌| 11099/11615 [00:31<00:01, 354.84 examples/s]
Map:  95%|█████████▌| 11037/11615 [00:31<00:01, 345.93 examples/s]
Map:  95%|█████████▌| 11062/11615 [00:31<00:01, 354.19 examples/s]
Map:  95%|█████████▍| 10991/11615 [00:31<00:01, 336.42 examples/s]
Map:  94%|█████████▍| 10974/11615 [00:31<00:01, 338.22 examples/s]
Map:  94%|█████████▍| 10971/11615 [00:31<00:01, 336.46 examples/s]
Map:  96%|█████████▌| 11127/11615 [00:31<00:01, 342.42 examples/s]
Map:  95%|█████████▌| 11075/11615 [00:31<00:01, 352.68 examples/s]
Map:  96%|█████████▌| 11099/11615 [00:31<00:01, 355.90 examples/s]
Map:  95%|█████████▍| 11030/11615 [00:31<00:01, 346.48 examples/s]
Map:  95%|█████████▍| 11010/11615 [00:31<00:01, 341.51 examples/s]
Map:  96%|█████████▌| 11150/11615 [00:31<00:01, 339.68 examples/s]
Map:  96%|█████████▌| 11162/11615 [00:31<00:01, 342.46 examples/s]
Map:  95%|█████████▍| 11009/11615 [00:31<00:01, 345.23 examples/s]
Map:  95%|█████████▌| 11070/11615 [00:31<00:01, 352.27 examples/s]
Map:  95%|█████████▌| 11050/11615 [00:31<00:01, 349.30 examples/s]
Map:  96%|█████████▌| 11127/11615 [00:31<00:01, 343.69 examples/s]
Map:  96%|█████████▋| 11190/11615 [00:31<00:01, 350.10 examples/s]
Map:  96%|█████████▌| 11150/11615 [00:31<00:01, 339.82 examples/s]
Map:  95%|█████████▌| 11046/11615 [00:31<00:01, 351.23 examples/s]
Map:  96%|█████████▋| 11200/11615 [00:31<00:01, 345.70 examples/s]
Map:  96%|█████████▌| 11107/11615 [00:31<00:01, 349.67 examples/s]
Map:  96%|█████████▌| 11162/11615 [00:31<00:01, 343.34 examples/s]
Map:  95%|█████████▌| 11090/11615 [00:31<00:01, 354.77 examples/s]
Map:  97%|█████████▋| 11228/11615 [00:31<00:01, 355.74 examples/s]
Map:  96%|█████████▋| 11190/11615 [00:31<00:01, 349.84 examples/s]
Map:  95%|█████████▌| 11083/11615 [00:31<00:01, 352.28 examples/s]
Map:  97%|█████████▋| 11240/11615 [00:31<00:01, 355.77 examples/s]
Map:  96%|█████████▋| 11200/11615 [00:31<00:01, 346.14 examples/s]
Map:  97%|█████████▋| 11265/11615 [00:31<00:00, 356.92 examples/s]
Map:  97%|█████████▋| 11228/11615 [00:31<00:01, 355.25 examples/s]
Map:  96%|█████████▌| 11119/11615 [00:31<00:01, 349.26 examples/s]
Map:  96%|█████████▌| 11158/11615 [00:31<00:01, 343.23 examples/s]
Map:  96%|█████████▌| 11140/11615 [00:31<00:01, 343.73 examples/s]
Map:  97%|█████████▋| 11293/11615 [00:31<00:00, 352.05 examples/s]
Map:  97%|█████████▋| 11240/11615 [00:31<00:01, 355.55 examples/s]
Map:  97%|█████████▋| 11265/11615 [00:31<00:00, 355.91 examples/s]
Map:  96%|█████████▋| 11195/11615 [00:31<00:01, 348.55 examples/s]
Map:  96%|█████████▌| 11177/11615 [00:31<00:01, 349.12 examples/s]
Map:  97%|█████████▋| 11320/11615 [00:31<00:00, 351.17 examples/s]
Map:  96%|█████████▌| 11171/11615 [00:31<00:01, 344.38 examples/s]
Map:  98%|█████████▊| 11331/11615 [00:31<00:00, 356.27 examples/s]
Map:  97%|█████████▋| 11231/11615 [00:32<00:01, 348.71 examples/s]
Map:  97%|█████████▋| 11213/11615 [00:32<00:01, 350.28 examples/s]
Map:  97%|█████████▋| 11293/11615 [00:32<00:00, 350.81 examples/s]
Map:  98%|█████████▊| 11360/11615 [00:32<00:00, 358.48 examples/s]
Map:  97%|█████████▋| 11320/11615 [00:32<00:00, 349.74 examples/s]
Map:  97%|█████████▋| 11209/11615 [00:32<00:01, 349.71 examples/s]
Map:  98%|█████████▊| 11370/11615 [00:32<00:00, 358.40 examples/s]
Map:  97%|█████████▋| 11269/11615 [00:32<00:00, 353.61 examples/s]
Map:  97%|█████████▋| 11250/11615 [00:32<00:01, 351.51 examples/s]
Map:  98%|█████████▊| 11331/11615 [00:32<00:00, 355.24 examples/s]
Map:  98%|█████████▊| 11400/11615 [00:32<00:00, 363.21 examples/s]
Map:  98%|█████████▊| 11360/11615 [00:32<00:00, 356.92 examples/s]
Map:  97%|█████████▋| 11246/11615 [00:32<00:01, 351.87 examples/s]
Map:  98%|█████████▊| 11409/11615 [00:32<00:00, 365.23 examples/s]
Map:  98%|█████████▊| 11370/11615 [00:32<00:00, 357.33 examples/s]
Map:  98%|█████████▊| 11437/11615 [00:32<00:00, 363.66 examples/s]
Map:  98%|█████████▊| 11400/11615 [00:32<00:00, 361.55 examples/s]
Map:  97%|█████████▋| 11320/11615 [00:32<00:00, 347.05 examples/s]
Map:  97%|█████████▋| 11304/11615 [00:32<00:00, 349.86 examples/s]
Map:  97%|█████████▋| 11299/11615 [00:32<00:00, 348.79 examples/s]
Map:  99%|█████████▊| 11463/11615 [00:32<00:00, 360.65 examples/s]
Map:  98%|█████████▊| 11409/11615 [00:32<00:00, 364.23 examples/s]
Map:  98%|█████████▊| 11437/11615 [00:32<00:00, 361.38 examples/s]
Map:  98%|█████████▊| 11360/11615 [00:32<00:00, 354.26 examples/s]
Map:  98%|█████████▊| 11341/11615 [00:32<00:00, 352.80 examples/s]
Map:  99%|█████████▉| 11491/11615 [00:32<00:00, 359.54 examples/s]
Map:  98%|█████████▊| 11337/11615 [00:32<00:00, 353.08 examples/s]
Map:  99%|█████████▉| 11517/11615 [00:32<00:00, 359.79 examples/s]
Map:  98%|█████████▊| 11400/11615 [00:32<00:00, 359.01 examples/s]
Map:  98%|█████████▊| 11380/11615 [00:32<00:00, 357.57 examples/s]
Map:  99%|█████████▊| 11463/11615 [00:32<00:00, 359.53 examples/s]
Map:  99%|█████████▉| 11530/11615 [00:32<00:00, 361.99 examples/s]
Map:  99%|█████████▉| 11491/11615 [00:32<00:00, 356.65 examples/s]
Map:  98%|█████████▊| 11374/11615 [00:32<00:00, 354.69 examples/s]
Map:  99%|█████████▉| 11554/11615 [00:32<00:00, 361.69 examples/s]
Map:  98%|█████████▊| 11437/11615 [00:32<00:00, 358.65 examples/s]
Map:  98%|█████████▊| 11417/11615 [00:32<00:00, 354.69 examples/s]
Map: 100%|█████████▉| 11568/11615 [00:32<00:00, 360.63 examples/s]
Map:  99%|█████████▉| 11530/11615 [00:32<00:00, 359.19 examples/s]
Map:  98%|█████████▊| 11411/11615 [00:32<00:00, 356.80 examples/s]
Map:  99%|█████████▉| 11517/11615 [00:32<00:00, 358.72 examples/s]
Map: 100%|█████████▉| 11591/11615 [00:32<00:00, 359.09 examples/s]
Map:  99%|█████████▊| 11454/11615 [00:32<00:00, 356.25 examples/s]
Map: 100%|█████████▉| 11605/11615 [00:32<00:00, 359.09 examples/s]
Map: 100%|█████████▉| 11568/11615 [00:32<00:00, 358.15 examples/s]
Map:  99%|█████████▉| 11490/11615 [00:32<00:00, 354.43 examples/s]
Map:  99%|█████████▊| 11450/11615 [00:32<00:00, 353.95 examples/s]
Map:  99%|█████████▉| 11554/11615 [00:32<00:00, 360.60 examples/s]
Map:  99%|█████████▉| 11490/11615 [00:32<00:00, 356.03 examples/s]
Map: 100%|█████████▉| 11605/11615 [00:32<00:00, 356.51 examples/s]
Map:  99%|█████████▉| 11528/11615 [00:32<00:00, 359.85 examples/s]
Map: 100%|█████████▉| 11591/11615 [00:32<00:00, 358.14 examples/s]
Map:  99%|█████████▉| 11490/11615 [00:32<00:00, 356.07 examples/s]
Map:  99%|█████████▉| 11528/11615 [00:32<00:00, 361.58 examples/s]
Map:  99%|█████████▉| 11528/11615 [00:32<00:00, 361.66 examples/s]
Map: 100%|█████████▉| 11583/11615 [00:33<00:00, 357.50 examples/s]
Map: 100%|█████████▉| 11583/11615 [00:33<00:00, 358.17 examples/s]
Map: 100%|█████████▉| 11583/11615 [00:33<00:00, 358.51 examples/s]
Map: 100%|██████████| 11615/11615 [00:46<00:00, 251.13 examples/s]
Map: 100%|█████████▉| 11605/11615 [00:46<00:00, 356.51 examples/s]
Map: 100%|█████████▉| 11605/11615 [00:46<00:00, 359.09 examples/s]
Map: 100%|█████████▉| 11609/11615 [00:46<00:00, 358.14 examples/s]
Map: 100%|█████████▉| 11600/11615 [00:46<00:00, 358.51 examples/s]
Map: 100%|█████████▉| 11600/11615 [00:46<00:00, 358.17 examples/s]
Map: 100%|█████████▉| 11600/11615 [00:46<00:00, 357.50 examples/s]
Map: 100%|██████████| 11615/11615 [00:47<00:00,  7.01 examples/s] 
Map: 100%|██████████| 11615/11615 [00:47<00:00, 245.71 examples/s]
Map: 100%|██████████| 11615/11615 [00:47<00:00,  8.28 examples/s] 
Map: 100%|██████████| 11615/11615 [00:47<00:00, 245.11 examples/s]
Map: 100%|██████████| 11615/11615 [00:47<00:00,  9.46 examples/s] 
Map: 100%|██████████| 11615/11615 [00:47<00:00, 244.66 examples/s]
Map: 100%|██████████| 11615/11615 [00:47<00:00,  9.04 examples/s] 
Map: 100%|██████████| 11615/11615 [00:47<00:00,  8.89 examples/s] 
Map: 100%|██████████| 11615/11615 [00:47<00:00, 243.30 examples/s]
Map: 100%|██████████| 11615/11615 [00:47<00:00, 243.43 examples/s]
Map: 100%|██████████| 11615/11615 [00:48<00:00,  6.36 examples/s] 
Map: 100%|██████████| 11615/11615 [00:48<00:00, 237.53 examples/s]
[2025-04-17 09:32:38,955] [INFO] [utils.py:781:see_memory_usage] After creating fp16 partitions: 2
[2025-04-17 09:32:38,956] [INFO] [utils.py:782:see_memory_usage] MA 3.74 GB         Max_MA 3.74 GB         CA 5.85 GB         Max_CA 6 GB 
[2025-04-17 09:32:38,956] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 28.9 GB, percent = 3.8%
[2025-04-17 09:32:39,068] [INFO] [utils.py:781:see_memory_usage] Before creating fp32 partitions
[2025-04-17 09:32:39,068] [INFO] [utils.py:782:see_memory_usage] MA 3.74 GB         Max_MA 3.74 GB         CA 5.85 GB         Max_CA 6 GB 
[2025-04-17 09:32:39,069] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 19.55 GB, percent = 2.6%
[2025-04-17 09:32:39,179] [INFO] [utils.py:781:see_memory_usage] After creating fp32 partitions
[2025-04-17 09:32:39,201] [INFO] [utils.py:782:see_memory_usage] MA 7.48 GB         Max_MA 9.35 GB         CA 11.46 GB         Max_CA 11 GB 
[2025-04-17 09:32:39,201] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 17.87 GB, percent = 2.4%
[2025-04-17 09:32:39,309] [INFO] [utils.py:781:see_memory_usage] Before initializing optimizer states
[2025-04-17 09:32:39,310] [INFO] [utils.py:782:see_memory_usage] MA 7.48 GB         Max_MA 7.48 GB         CA 11.46 GB         Max_CA 11 GB 
[2025-04-17 09:32:39,310] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 17.88 GB, percent = 2.4%
[2025-04-17 09:32:39,434] [INFO] [utils.py:781:see_memory_usage] After initializing optimizer states
[2025-04-17 09:32:39,434] [INFO] [utils.py:782:see_memory_usage] MA 7.48 GB         Max_MA 11.22 GB         CA 15.2 GB         Max_CA 15 GB 
[2025-04-17 09:32:39,434] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 17.88 GB, percent = 2.4%
[2025-04-17 09:32:39,435] [INFO] [stage3.py:486:_setup_for_real_optimizer] optimizer state initialized
[2025-04-17 09:32:40,387] [INFO] [utils.py:781:see_memory_usage] After initializing ZeRO optimizer
[2025-04-17 09:32:40,388] [INFO] [utils.py:782:see_memory_usage] MA 10.28 GB         Max_MA 12.24 GB         CA 15.2 GB         Max_CA 15 GB 
[2025-04-17 09:32:40,388] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 17.88 GB, percent = 2.4%
[2025-04-17 09:32:40,388] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = DeepSpeedZeroOptimizer_Stage3
[2025-04-17 09:32:40,388] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2025-04-17 09:32:40,388] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = None
[2025-04-17 09:32:40,388] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2025-04-17 09:32:40,389] [INFO] [config.py:997:print] DeepSpeedEngine configuration:
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   amp_enabled .................. False
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   amp_params ................... False
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   bfloat16_enabled ............. True
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   bfloat16_immediate_grad_update  False
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   checkpoint_parallel_write_pipeline  False
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   checkpoint_tag_validation_enabled  True
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   checkpoint_tag_validation_fail  False
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f12c24d8250>
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   communication_data_type ...... None
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   curriculum_enabled_legacy .... False
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   curriculum_params_legacy ..... False
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   data_efficiency_enabled ...... False
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   dataloader_drop_last ......... False
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   disable_allgather ............ False
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   dump_state ................... False
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   dynamic_loss_scale_args ...... None
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   eigenvalue_enabled ........... False
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   eigenvalue_gas_boundary_resolution  1
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   eigenvalue_layer_num ......... 0
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   eigenvalue_max_iter .......... 100
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   eigenvalue_stability ......... 1e-06
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   eigenvalue_tol ............... 0.01
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   eigenvalue_verbose ........... False
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   elasticity_enabled ........... False
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   flops_profiler_config ........ {
    "enabled": false, 
    "recompute_fwd_factor": 0.0, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   fp16_auto_cast ............... None
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   fp16_enabled ................. False
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   fp16_master_weights_and_gradients  False
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   global_rank .................. 0
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   grad_accum_dtype ............. None
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   gradient_accumulation_steps .. 16
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   gradient_clipping ............ 1.0
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   gradient_predivide_factor .... 1.0
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   graph_harvesting ............. False
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2025-04-17 09:32:40,389] [INFO] [config.py:1001:print]   initial_dynamic_scale ........ 1
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   load_universal_checkpoint .... False
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   loss_scale ................... 1.0
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   memory_breakdown ............. False
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   mics_hierarchial_params_gather  False
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   mics_shard_size .............. -1
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') comet=CometConfig(enabled=False, samples_log_interval=100, project=None, workspace=None, api_key=None, experiment_name=None, experiment_key=None, online=None, mode=None) wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   optimizer_legacy_fusion ...... False
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   optimizer_name ............... None
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   optimizer_params ............. None
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True}
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   pld_enabled .................. False
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   pld_params ................... False
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   prescale_gradients ........... False
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   scheduler_name ............... None
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   scheduler_params ............. None
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   seq_parallel_communication_data_type  torch.float32
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   sparse_attention ............. None
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   sparse_gradients_enabled ..... False
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   steps_per_print .............. inf
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   timers_config ................ enabled=True synchronized=True
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   train_batch_size ............. 128
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   train_micro_batch_size_per_gpu  1
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   use_data_before_expert_parallel_  False
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   use_node_local_storage ....... False
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   wall_clock_breakdown ......... False
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   weight_quantization_config ... None
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   world_size ................... 8
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   zero_allow_untested_optimizer  True
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False, ratio=1.0) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=True use_all_reduce_for_fetch_params=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   zero_enabled ................. True
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   zero_force_ds_cpu_optimizer .. True
[2025-04-17 09:32:40,390] [INFO] [config.py:1001:print]   zero_optimization_stage ...... 3
[2025-04-17 09:32:40,390] [INFO] [config.py:987:print_user_config]   json = {
    "train_batch_size": 128, 
    "train_micro_batch_size_per_gpu": 1, 
    "gradient_accumulation_steps": 16, 
    "zero_optimization": {
        "stage": 3, 
        "offload_optimizer": {
            "device": "none", 
            "nvme_path": null
        }, 
        "offload_param": {
            "device": "none", 
            "nvme_path": null
        }, 
        "stage3_gather_16bit_weights_on_model_save": true
    }, 
    "gradient_clipping": 1.0, 
    "steps_per_print": inf, 
    "bf16": {
        "enabled": true
    }, 
    "fp16": {
        "enabled": false
    }, 
    "zero_allow_untested_optimizer": true
}
[INFO|trainer.py:2134] 2025-04-17 09:32:40,391 >> ***** Running training *****
[INFO|trainer.py:2135] 2025-04-17 09:32:40,391 >>   Num examples = 11,615
[INFO|trainer.py:2136] 2025-04-17 09:32:40,391 >>   Num Epochs = 1
[INFO|trainer.py:2137] 2025-04-17 09:32:40,391 >>   Instantaneous batch size per device = 1
[INFO|trainer.py:2140] 2025-04-17 09:32:40,391 >>   Total train batch size (w. parallel, distributed & accumulation) = 128
[INFO|trainer.py:2141] 2025-04-17 09:32:40,391 >>   Gradient Accumulation steps = 16
[INFO|trainer.py:2142] 2025-04-17 09:32:40,391 >>   Total optimization steps = 90
[INFO|trainer.py:2143] 2025-04-17 09:32:40,392 >>   Number of trainable parameters = 8,030,261,248
  0%|          | 0/90 [00:00<?, ?it/s][WARNING|modeling_utils.py:1239] 2025-04-17 09:32:47,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed
[WARNING|modeling_utils.py:1239] 2025-04-17 09:32:47,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed
[WARNING|modeling_utils.py:1239] 2025-04-17 09:32:47,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed
[WARNING|modeling_utils.py:1239] 2025-04-17 09:32:47,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed
[WARNING|modeling_utils.py:1239] 2025-04-17 09:32:47,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed
[WARNING|modeling_utils.py:1239] 2025-04-17 09:32:47,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed
[WARNING|modeling_utils.py:1239] 2025-04-17 09:32:47,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed
[WARNING|modeling_utils.py:1239] 2025-04-17 09:32:47,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed
[2025-04-17 09:34:08,879] [WARNING] [stage3.py:2069:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
  1%|          | 1/90 [01:28<2:11:01, 88.33s/it]
                                                
{'loss': 0.6931, 'grad_norm': 13.118868708702406, 'learning_rate': 8.888888888888888e-08, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -0.44406357407569885, 'logps/chosen': -0.38893356919288635, 'logits/rejected': -0.5279894471168518, 'logits/chosen': -0.5667163133621216, 'epoch': 0.01}
  1%|          | 1/90 [01:28<2:11:01, 88.33s/it]
  2%|▏         | 2/90 [02:49<2:03:21, 84.11s/it]
                                                
{'loss': 0.6931, 'grad_norm': 13.947230886816447, 'learning_rate': 1.7777777777777776e-07, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -0.5201272964477539, 'logps/chosen': -0.4288996160030365, 'logits/rejected': -0.6086758375167847, 'logits/chosen': -0.6076720952987671, 'epoch': 0.02}
  2%|▏         | 2/90 [02:49<2:03:21, 84.11s/it]
  3%|▎         | 3/90 [04:14<2:02:53, 84.75s/it]
  4%|▍         | 4/90 [05:41<2:02:37, 85.55s/it]
                                                
{'loss': 0.6922, 'grad_norm': 12.776448326416984, 'learning_rate': 3.5555555555555553e-07, 'rewards/chosen': -0.0032942602410912514, 'rewards/rejected': -0.007766406051814556, 'rewards/accuracies': 0.5625, 'rewards/margins': 0.004472145810723305, 'logps/rejected': -0.4696415364742279, 'logps/chosen': -0.37240493297576904, 'logits/rejected': -0.4740648567676544, 'logits/chosen': -0.5709832906723022, 'epoch': 0.04}
  4%|▍         | 4/90 [05:41<2:02:37, 85.55s/it]
  6%|▌         | 5/90 [07:06<2:00:49, 85.29s/it][2025-04-17 09:41:12,140] [WARNING] [stage3.py:2069:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
  7%|▋         | 6/90 [08:31<1:59:15, 85.19s/it]
                                                
{'loss': 0.6932, 'grad_norm': 12.68829414955742, 'learning_rate': 5.333333333333332e-07, 'rewards/chosen': -0.0019585667178034782, 'rewards/rejected': -0.000768774189054966, 'rewards/accuracies': 0.46875, 'rewards/margins': -0.0011897925287485123, 'logps/rejected': -0.43130457401275635, 'logps/chosen': -0.3867373466491699, 'logits/rejected': -0.5615403056144714, 'logits/chosen': -0.5552195906639099, 'epoch': 0.07}
  7%|▋         | 6/90 [08:31<1:59:15, 85.19s/it]
  8%|▊         | 7/90 [09:54<1:56:42, 84.36s/it]
  9%|▉         | 8/90 [11:21<1:56:25, 85.19s/it]
                                                
{'loss': 0.6916, 'grad_norm': 13.057077881533889, 'learning_rate': 7.111111111111111e-07, 'rewards/chosen': -0.0007295235991477966, 'rewards/rejected': -0.005617556162178516, 'rewards/accuracies': 0.5625, 'rewards/margins': 0.00488803256303072, 'logps/rejected': -0.450803279876709, 'logps/chosen': -0.3987111747264862, 'logits/rejected': -0.45482489466667175, 'logits/chosen': -0.4382602572441101, 'epoch': 0.09}
  9%|▉         | 8/90 [11:21<1:56:25, 85.19s/it]
 10%|█         | 9/90 [12:42<1:53:19, 83.94s/it]
 11%|█         | 10/90 [14:13<1:54:41, 86.02s/it]
                                                 
{'loss': 0.6899, 'grad_norm': 15.305411576596303, 'learning_rate': 7.99699181001692e-07, 'rewards/chosen': -0.010051962919533253, 'rewards/rejected': -0.011776960454881191, 'rewards/accuracies': 0.5625, 'rewards/margins': 0.0017249980010092258, 'logps/rejected': -0.43802109360694885, 'logps/chosen': -0.36202603578567505, 'logits/rejected': -0.5882404446601868, 'logits/chosen': -0.5967720746994019, 'epoch': 0.11}
 11%|█         | 10/90 [14:13<1:54:41, 86.02s/it]
 12%|█▏        | 11/90 [15:36<1:52:16, 85.28s/it]
 13%|█▎        | 12/90 [17:02<1:51:01, 85.40s/it]
                                                 
{'loss': 0.6868, 'grad_norm': 12.884120156299165, 'learning_rate': 7.972953430967771e-07, 'rewards/chosen': -0.03419490158557892, 'rewards/rejected': -0.043705083429813385, 'rewards/accuracies': 0.65625, 'rewards/margins': 0.009510181844234467, 'logps/rejected': -0.46348121762275696, 'logps/chosen': -0.43160080909729004, 'logits/rejected': -0.5779095888137817, 'logits/chosen': -0.6061941385269165, 'epoch': 0.13}
 13%|█▎        | 12/90 [17:02<1:51:01, 85.40s/it]
 14%|█▍        | 13/90 [18:28<1:50:03, 85.77s/it]
 16%|█▌        | 14/90 [19:52<1:47:41, 85.03s/it]
                                                 
{'loss': 0.6795, 'grad_norm': 14.26134346874858, 'learning_rate': 7.925021242509539e-07, 'rewards/chosen': -0.047850728034973145, 'rewards/rejected': -0.08452175557613373, 'rewards/accuracies': 0.8125, 'rewards/margins': 0.036671023815870285, 'logps/rejected': -0.5124518275260925, 'logps/chosen': -0.4427701532840729, 'logits/rejected': -0.6748733520507812, 'logits/chosen': -0.6759158968925476, 'epoch': 0.15}
 16%|█▌        | 14/90 [19:52<1:47:41, 85.03s/it]
 17%|█▋        | 15/90 [21:17<1:46:12, 84.97s/it]
 18%|█▊        | 16/90 [22:41<1:44:26, 84.68s/it]
                                                 
{'loss': 0.6691, 'grad_norm': 14.170102399113276, 'learning_rate': 7.853483514463521e-07, 'rewards/chosen': -0.05217953026294708, 'rewards/rejected': -0.12870296835899353, 'rewards/accuracies': 0.875, 'rewards/margins': 0.07652343809604645, 'logps/rejected': -0.4873093068599701, 'logps/chosen': -0.4023669958114624, 'logits/rejected': -0.6268794536590576, 'logits/chosen': -0.5950570702552795, 'epoch': 0.18}
 18%|█▊        | 16/90 [22:41<1:44:26, 84.68s/it][2025-04-17 09:56:48,619] [WARNING] [stage3.py:2069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
 19%|█▉        | 17/90 [24:08<1:43:51, 85.36s/it]
 20%|██        | 18/90 [25:32<1:42:06, 85.09s/it]
                                                 
{'loss': 0.6641, 'grad_norm': 14.311643771283803, 'learning_rate': 7.758770483143634e-07, 'rewards/chosen': -0.14431723952293396, 'rewards/rejected': -0.21388906240463257, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.06957180798053741, 'logps/rejected': -0.4822029769420624, 'logps/chosen': -0.41072213649749756, 'logits/rejected': -0.6478306651115417, 'logits/chosen': -0.5682568550109863, 'epoch': 0.2}
 20%|██        | 18/90 [25:32<1:42:06, 85.09s/it]
 21%|██        | 19/90 [27:01<1:42:10, 86.35s/it]
 22%|██▏       | 20/90 [28:27<1:40:21, 86.02s/it]
                                                 
{'loss': 0.6515, 'grad_norm': 14.78648866124662, 'learning_rate': 7.641451763864587e-07, 'rewards/chosen': -0.16291140019893646, 'rewards/rejected': -0.2518925666809082, 'rewards/accuracies': 0.75, 'rewards/margins': 0.08898115903139114, 'logps/rejected': -0.4958353042602539, 'logps/chosen': -0.4066483974456787, 'logits/rejected': -0.7652315497398376, 'logits/chosen': -0.7744452953338623, 'epoch': 0.22}
 22%|██▏       | 20/90 [28:27<1:40:21, 86.02s/it]
 23%|██▎       | 21/90 [29:50<1:38:06, 85.32s/it]
 24%|██▍       | 22/90 [31:17<1:37:03, 85.64s/it]
                                                 
{'loss': 0.6411, 'grad_norm': 14.152241610096004, 'learning_rate': 7.502232925208364e-07, 'rewards/chosen': -0.2281702309846878, 'rewards/rejected': -0.3797053396701813, 'rewards/accuracies': 0.8125, 'rewards/margins': 0.15153509378433228, 'logps/rejected': -0.5093738436698914, 'logps/chosen': -0.4334159791469574, 'logits/rejected': -0.6482667922973633, 'logits/chosen': -0.6839113235473633, 'epoch': 0.24}
 24%|██▍       | 22/90 [31:17<1:37:03, 85.64s/it][2025-04-17 10:05:25,108] [WARNING] [stage3.py:2069:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
 26%|██▌       | 23/90 [32:44<1:36:13, 86.17s/it]
 27%|██▋       | 24/90 [34:07<1:33:52, 85.34s/it]
                                                 
{'loss': 0.634, 'grad_norm': 14.605374502299147, 'learning_rate': 7.341951245651746e-07, 'rewards/chosen': -0.2152707278728485, 'rewards/rejected': -0.3378959000110626, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.1226251870393753, 'logps/rejected': -0.4742896258831024, 'logps/chosen': -0.38133612275123596, 'logits/rejected': -0.6561180949211121, 'logits/chosen': -0.6552442908287048, 'epoch': 0.26}
 27%|██▋       | 24/90 [34:07<1:33:52, 85.34s/it]
 28%|██▊       | 25/90 [35:35<1:33:09, 85.99s/it][2025-04-17 10:09:41,987] [WARNING] [stage3.py:2069:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
 29%|██▉       | 26/90 [37:01<1:31:43, 85.99s/it]
                                                 
{'loss': 0.6131, 'grad_norm': 15.752356522610803, 'learning_rate': 7.161570678075036e-07, 'rewards/chosen': -0.25159555673599243, 'rewards/rejected': -0.4293104112148285, 'rewards/accuracies': 0.84375, 'rewards/margins': 0.17771483957767487, 'logps/rejected': -0.47099706530570984, 'logps/chosen': -0.3832031190395355, 'logits/rejected': -0.7421925067901611, 'logits/chosen': -0.7265116572380066, 'epoch': 0.29}
 29%|██▉       | 26/90 [37:01<1:31:43, 85.99s/it]
 30%|███       | 27/90 [38:27<1:30:22, 86.08s/it]
 31%|███       | 28/90 [39:51<1:28:07, 85.29s/it]
                                                 
{'loss': 0.5944, 'grad_norm': 15.17209539688657, 'learning_rate': 6.962176052436018e-07, 'rewards/chosen': -0.3388962745666504, 'rewards/rejected': -0.5356280207633972, 'rewards/accuracies': 0.75, 'rewards/margins': 0.19673170149326324, 'logps/rejected': -0.45470133423805237, 'logps/chosen': -0.38757604360580444, 'logits/rejected': -0.7345452308654785, 'logits/chosen': -0.7692828178405762, 'epoch': 0.31}
 31%|███       | 28/90 [39:51<1:28:07, 85.29s/it]
 32%|███▏      | 29/90 [41:15<1:26:28, 85.06s/it]
 33%|███▎      | 30/90 [42:44<1:26:09, 86.17s/it]
                                                 
{'loss': 0.5962, 'grad_norm': 14.261012961837183, 'learning_rate': 6.744966551474934e-07, 'rewards/chosen': -0.6278049945831299, 'rewards/rejected': -0.8038355112075806, 'rewards/accuracies': 0.625, 'rewards/margins': 0.1760304719209671, 'logps/rejected': -0.5537043213844299, 'logps/chosen': -0.4845855236053467, 'logits/rejected': -0.8926427960395813, 'logits/chosen': -0.824048638343811, 'epoch': 0.33}
 33%|███▎      | 30/90 [42:44<1:26:09, 86.17s/it]
 34%|███▍      | 31/90 [44:11<1:25:04, 86.52s/it]
 36%|███▌      | 32/90 [45:37<1:23:21, 86.24s/it]
                                                 
{'loss': 0.5855, 'grad_norm': 17.87178228277289, 'learning_rate': 6.511248498688394e-07, 'rewards/chosen': -0.5609052181243896, 'rewards/rejected': -0.9495440125465393, 'rewards/accuracies': 0.8125, 'rewards/margins': 0.3886387050151825, 'logps/rejected': -0.5306280255317688, 'logps/chosen': -0.4317500591278076, 'logits/rejected': -0.9001604914665222, 'logits/chosen': -0.8501050472259521, 'epoch': 0.35}
 36%|███▌      | 32/90 [45:37<1:23:21, 86.24s/it]
 37%|███▋      | 33/90 [47:06<1:22:51, 87.22s/it]
 38%|███▊      | 34/90 [48:27<1:19:29, 85.17s/it]
                                                 
{'loss': 0.5504, 'grad_norm': 18.355453479911887, 'learning_rate': 6.262427501946155e-07, 'rewards/chosen': -0.9470397233963013, 'rewards/rejected': -1.3090370893478394, 'rewards/accuracies': 0.71875, 'rewards/margins': 0.36199724674224854, 'logps/rejected': -0.6247559785842896, 'logps/chosen': -0.582435667514801, 'logits/rejected': -0.9649584293365479, 'logits/chosen': -0.8647527694702148, 'epoch': 0.37}
 38%|███▊      | 34/90 [48:27<1:19:29, 85.17s/it]
 39%|███▉      | 35/90 [49:54<1:18:45, 85.91s/it][2025-04-17 10:24:01,476] [WARNING] [stage3.py:2069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
 40%|████      | 36/90 [51:20<1:17:20, 85.94s/it]
                                                 
{'loss': 0.54, 'grad_norm': 15.963807106085282, 'learning_rate': 6e-07, 'rewards/chosen': -0.7385827302932739, 'rewards/rejected': -1.1326334476470947, 'rewards/accuracies': 0.75, 'rewards/margins': 0.394050657749176, 'logps/rejected': -0.539201021194458, 'logps/chosen': -0.4564329981803894, 'logits/rejected': -0.842654287815094, 'logits/chosen': -0.824530303478241, 'epoch': 0.4}
 40%|████      | 36/90 [51:20<1:17:20, 85.94s/it][2025-04-17 10:25:29,587] [WARNING] [stage3.py:2069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
 41%|████      | 37/90 [52:49<1:16:29, 86.59s/it]
 42%|████▏     | 38/90 [54:13<1:14:30, 85.96s/it]
                                                 
{'loss': 0.5289, 'grad_norm': 17.84560733227794, 'learning_rate': 5.725544262725014e-07, 'rewards/chosen': -0.8527265191078186, 'rewards/rejected': -1.4350509643554688, 'rewards/accuracies': 0.875, 'rewards/margins': 0.5823245048522949, 'logps/rejected': -0.5730911493301392, 'logps/chosen': -0.4507395029067993, 'logits/rejected': -0.9467881917953491, 'logits/chosen': -0.861293375492096, 'epoch': 0.42}
 42%|████▏     | 38/90 [54:13<1:14:30, 85.96s/it][2025-04-17 10:28:21,839] [WARNING] [stage3.py:2069:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
 43%|████▎     | 39/90 [55:41<1:13:31, 86.50s/it]
 44%|████▍     | 40/90 [57:05<1:11:26, 85.73s/it]
                                                 
{'loss': 0.4978, 'grad_norm': 16.513802657330032, 'learning_rate': 5.440710899218841e-07, 'rewards/chosen': -1.1461962461471558, 'rewards/rejected': -1.873557448387146, 'rewards/accuracies': 0.875, 'rewards/margins': 0.7273613810539246, 'logps/rejected': -0.6848773956298828, 'logps/chosen': -0.5149874687194824, 'logits/rejected': -0.9830992817878723, 'logits/chosen': -0.9551966190338135, 'epoch': 0.44}
 44%|████▍     | 40/90 [57:05<1:11:26, 85.73s/it]
 46%|████▌     | 41/90 [58:29<1:09:41, 85.35s/it]
 47%|████▋     | 42/90 [59:51<1:07:18, 84.14s/it]
                                                 
{'loss': 0.4695, 'grad_norm': 17.248966838332684, 'learning_rate': 5.147212930844361e-07, 'rewards/chosen': -1.0791146755218506, 'rewards/rejected': -1.5966213941574097, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.5175067186355591, 'logps/rejected': -0.5896176695823669, 'logps/chosen': -0.4803640842437744, 'logits/rejected': -0.9597427845001221, 'logits/chosen': -0.9694040417671204, 'epoch': 0.46}
 47%|████▋     | 42/90 [59:51<1:07:18, 84.14s/it]
 48%|████▊     | 43/90 [1:01:15<1:05:53, 84.11s/it]
 49%|████▉     | 44/90 [1:02:39<1:04:28, 84.10s/it]
                                                   
{'loss': 0.5101, 'grad_norm': 19.040158218231422, 'learning_rate': 4.846815488917643e-07, 'rewards/chosen': -1.4904017448425293, 'rewards/rejected': -2.356515645980835, 'rewards/accuracies': 0.90625, 'rewards/margins': 0.8661140203475952, 'logps/rejected': -0.7326344847679138, 'logps/chosen': -0.5798857808113098, 'logits/rejected': -1.0101063251495361, 'logits/chosen': -0.9522362947463989, 'epoch': 0.48}
 49%|████▉     | 44/90 [1:02:39<1:04:28, 84.10s/it]
 50%|█████     | 45/90 [1:04:04<1:03:16, 84.36s/it]
 51%|█████     | 46/90 [1:05:33<1:02:59, 85.90s/it]
                                                   
{'loss': 0.4829, 'grad_norm': 19.219892380406787, 'learning_rate': 4.541325199000525e-07, 'rewards/chosen': -1.324497103691101, 'rewards/rejected': -2.019899368286133, 'rewards/accuracies': 0.78125, 'rewards/margins': 0.6954022645950317, 'logps/rejected': -0.664832353591919, 'logps/chosen': -0.5283989310264587, 'logits/rejected': -0.9733825325965881, 'logits/chosen': -0.94602370262146, 'epoch': 0.51}
 51%|█████     | 46/90 [1:05:33<1:02:59, 85.90s/it]
 52%|█████▏    | 47/90 [1:06:56<1:00:59, 85.10s/it]
 53%|█████▎    | 48/90 [1:08:20<59:22, 84.83s/it]  
                                                 
{'loss': 0.4484, 'grad_norm': 19.11258365646643, 'learning_rate': 4.232579315641903e-07, 'rewards/chosen': -1.4275002479553223, 'rewards/rejected': -2.367154836654663, 'rewards/accuracies': 0.875, 'rewards/margins': 0.939654529094696, 'logps/rejected': -0.6905348896980286, 'logps/chosen': -0.5142701864242554, 'logits/rejected': -0.977350652217865, 'logits/chosen': -1.0023889541625977, 'epoch': 0.53}
 53%|█████▎    | 48/90 [1:08:20<59:22, 84.83s/it]
 54%|█████▍    | 49/90 [1:09:45<57:49, 84.62s/it]
 56%|█████▌    | 50/90 [1:11:12<57:01, 85.55s/it]
                                                 
{'loss': 0.4559, 'grad_norm': 16.316139280265727, 'learning_rate': 3.922434672912703e-07, 'rewards/chosen': -1.6977019309997559, 'rewards/rejected': -2.678197145462036, 'rewards/accuracies': 0.8125, 'rewards/margins': 0.9804950952529907, 'logps/rejected': -0.7621501684188843, 'logps/chosen': -0.5877991914749146, 'logits/rejected': -1.0994668006896973, 'logits/chosen': -1.0198724269866943, 'epoch': 0.55}
 56%|█████▌    | 50/90 [1:11:12<57:01, 85.55s/it]
 57%|█████▋    | 51/90 [1:12:38<55:37, 85.59s/it]
 58%|█████▊    | 52/90 [1:14:03<54:06, 85.44s/it]
                                                 
{'loss': 0.4246, 'grad_norm': 18.48258508537425, 'learning_rate': 3.612756517187284e-07, 'rewards/chosen': -1.3701658248901367, 'rewards/rejected': -2.584001064300537, 'rewards/accuracies': 0.875, 'rewards/margins': 1.21383535861969, 'logps/rejected': -0.7321325540542603, 'logps/chosen': -0.49891898036003113, 'logits/rejected': -0.9723455905914307, 'logits/chosen': -0.9567621946334839, 'epoch': 0.57}
 58%|█████▊    | 52/90 [1:14:03<54:06, 85.44s/it]
 59%|█████▉    | 53/90 [1:15:27<52:20, 84.87s/it][2025-04-17 10:49:37,583] [WARNING] [stage3.py:2069:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
 60%|██████    | 54/90 [1:16:57<51:49, 86.37s/it]
                                                 
{'loss': 0.4235, 'grad_norm': 21.614885501356266, 'learning_rate': 3.3054072893322785e-07, 'rewards/chosen': -1.6857835054397583, 'rewards/rejected': -2.8496077060699463, 'rewards/accuracies': 0.78125, 'rewards/margins': 1.1638245582580566, 'logps/rejected': -0.7688043117523193, 'logps/chosen': -0.5853520631790161, 'logits/rejected': -0.9895510077476501, 'logits/chosen': -0.9601039290428162, 'epoch': 0.6}
 60%|██████    | 54/90 [1:16:57<51:49, 86.37s/it]
 61%|██████    | 55/90 [1:18:21<50:02, 85.78s/it]
 62%|██████▏   | 56/90 [1:19:43<48:03, 84.81s/it]
                                                 
{'loss': 0.4445, 'grad_norm': 21.57431461671107, 'learning_rate': 3.002235423768075e-07, 'rewards/chosen': -1.8550083637237549, 'rewards/rejected': -2.430619478225708, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.5756112337112427, 'logps/rejected': -0.687829852104187, 'logps/chosen': -0.6122458577156067, 'logits/rejected': -1.0879956483840942, 'logits/chosen': -0.9796087741851807, 'epoch': 0.62}
 62%|██████▏   | 56/90 [1:19:43<48:03, 84.81s/it]
 63%|██████▎   | 57/90 [1:21:06<46:15, 84.10s/it]
 64%|██████▍   | 58/90 [1:22:31<45:03, 84.48s/it]
                                                 
{'loss': 0.4108, 'grad_norm': 19.257075056757245, 'learning_rate': 2.705064231766716e-07, 'rewards/chosen': -1.6164586544036865, 'rewards/rejected': -2.8737118244171143, 'rewards/accuracies': 0.8125, 'rewards/margins': 1.2572534084320068, 'logps/rejected': -0.7279610633850098, 'logps/chosen': -0.5418040752410889, 'logits/rejected': -1.012742280960083, 'logits/chosen': -0.8683714866638184, 'epoch': 0.64}
 64%|██████▍   | 58/90 [1:22:31<45:03, 84.48s/it]
 66%|██████▌   | 59/90 [1:23:59<44:10, 85.50s/it]
 67%|██████▋   | 60/90 [1:25:23<42:29, 84.97s/it]
                                                 
{'loss': 0.4417, 'grad_norm': 17.226682494195806, 'learning_rate': 2.4156809358433725e-07, 'rewards/chosen': -1.6194758415222168, 'rewards/rejected': -3.017855167388916, 'rewards/accuracies': 0.78125, 'rewards/margins': 1.3983792066574097, 'logps/rejected': -0.7989769577980042, 'logps/chosen': -0.580237865447998, 'logits/rejected': -1.036994218826294, 'logits/chosen': -0.9898002743721008, 'epoch': 0.66}
 67%|██████▋   | 60/90 [1:25:23<42:29, 84.97s/it]
 68%|██████▊   | 61/90 [1:26:47<40:55, 84.66s/it]
 69%|██████▉   | 62/90 [1:28:09<39:08, 83.87s/it]
                                                 
{'loss': 0.439, 'grad_norm': 20.629092002279585, 'learning_rate': 2.1358259211898456e-07, 'rewards/chosen': -1.9807188510894775, 'rewards/rejected': -3.1719274520874023, 'rewards/accuracies': 0.78125, 'rewards/margins': 1.1912086009979248, 'logps/rejected': -0.7586198449134827, 'logps/chosen': -0.6048812866210938, 'logits/rejected': -1.061993956565857, 'logits/chosen': -0.9758970737457275, 'epoch': 0.68}
 69%|██████▉   | 62/90 [1:28:09<39:08, 83.87s/it]
 70%|███████   | 63/90 [1:29:34<37:53, 84.22s/it][2025-04-17 11:03:38,623] [WARNING] [stage3.py:2069:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
 71%|███████   | 64/90 [1:30:58<36:25, 84.05s/it]
                                                 
{'loss': 0.4413, 'grad_norm': 19.17234071494537, 'learning_rate': 1.8671822687932358e-07, 'rewards/chosen': -1.7550288438796997, 'rewards/rejected': -2.3974623680114746, 'rewards/accuracies': 0.75, 'rewards/margins': 0.642433226108551, 'logps/rejected': -0.6383996605873108, 'logps/chosen': -0.5687658190727234, 'logits/rejected': -1.0495095252990723, 'logits/chosen': -0.9591912627220154, 'epoch': 0.71}
 71%|███████   | 64/90 [1:30:58<36:25, 84.05s/it]
 72%|███████▏  | 65/90 [1:32:20<34:49, 83.59s/it]
 73%|███████▎  | 66/90 [1:33:42<33:10, 82.95s/it]
                                                 
{'loss': 0.4838, 'grad_norm': 22.14019455312053, 'learning_rate': 1.6113656331888562e-07, 'rewards/chosen': -2.271655797958374, 'rewards/rejected': -3.396193504333496, 'rewards/accuracies': 0.8125, 'rewards/margins': 1.124537706375122, 'logps/rejected': -0.8486902117729187, 'logps/chosen': -0.6505975127220154, 'logits/rejected': -1.063037395477295, 'logits/chosen': -0.9694862365722656, 'epoch': 0.73}
 73%|███████▎  | 66/90 [1:33:42<33:10, 82.95s/it]
 74%|███████▍  | 67/90 [1:35:05<31:49, 83.00s/it]
 76%|███████▌  | 68/90 [1:36:27<30:19, 82.70s/it]
                                                 
{'loss': 0.4444, 'grad_norm': 19.546391338008878, 'learning_rate': 1.3699145257237458e-07, 'rewards/chosen': -1.642604947090149, 'rewards/rejected': -2.590425729751587, 'rewards/accuracies': 0.875, 'rewards/margins': 0.9478207230567932, 'logps/rejected': -0.6791585087776184, 'logps/chosen': -0.5404257774353027, 'logits/rejected': -1.0038275718688965, 'logits/chosen': -0.9407296180725098, 'epoch': 0.75}
 76%|███████▌  | 68/90 [1:36:27<30:19, 82.70s/it]
 77%|███████▋  | 69/90 [1:37:54<29:22, 83.94s/it]
 78%|███████▊  | 70/90 [1:39:17<27:58, 83.93s/it]
                                                 
{'loss': 0.4082, 'grad_norm': 18.225850285260986, 'learning_rate': 1.1442810617684044e-07, 'rewards/chosen': -1.6669143438339233, 'rewards/rejected': -2.7286770343780518, 'rewards/accuracies': 0.8125, 'rewards/margins': 1.0617626905441284, 'logps/rejected': -0.7196572422981262, 'logps/chosen': -0.5312512516975403, 'logits/rejected': -0.9402793645858765, 'logits/chosen': -0.9426168203353882, 'epoch': 0.77}
 78%|███████▊  | 70/90 [1:39:17<27:58, 83.93s/it]
 79%|███████▉  | 71/90 [1:40:43<26:44, 84.44s/it]
 80%|████████  | 72/90 [1:42:05<25:06, 83.68s/it]
                                                 
{'loss': 0.4216, 'grad_norm': 19.42933612893159, 'learning_rate': 9.358222275240884e-08, 'rewards/chosen': -1.944184422492981, 'rewards/rejected': -3.125523805618286, 'rewards/accuracies': 0.8125, 'rewards/margins': 1.1813393831253052, 'logps/rejected': -0.7975329756736755, 'logps/chosen': -0.6025781035423279, 'logits/rejected': -1.0266480445861816, 'logits/chosen': -0.9661900401115417, 'epoch': 0.79}
 80%|████████  | 72/90 [1:42:05<25:06, 83.68s/it][2025-04-17 11:16:11,069] [WARNING] [stage3.py:2069:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
 81%|████████  | 73/90 [1:43:30<23:49, 84.10s/it]
 82%|████████▏ | 74/90 [1:44:57<22:41, 85.09s/it]
                                                 
{'loss': 0.4534, 'grad_norm': 21.38897409862689, 'learning_rate': 7.4579171894813e-08, 'rewards/chosen': -1.8805345296859741, 'rewards/rejected': -3.051450729370117, 'rewards/accuracies': 0.90625, 'rewards/margins': 1.1709163188934326, 'logps/rejected': -0.763328492641449, 'logps/chosen': -0.6077739596366882, 'logits/rejected': -1.0634962320327759, 'logits/chosen': -1.0327014923095703, 'epoch': 0.82}
 82%|████████▏ | 74/90 [1:44:57<22:41, 85.09s/it]
 83%|████████▎ | 75/90 [1:46:22<21:13, 84.87s/it]
 84%|████████▍ | 76/90 [1:47:46<19:46, 84.77s/it]
                                                 
{'loss': 0.4474, 'grad_norm': 24.710974653416642, 'learning_rate': 5.753324018789345e-08, 'rewards/chosen': -1.5360963344573975, 'rewards/rejected': -2.9876980781555176, 'rewards/accuracies': 0.875, 'rewards/margins': 1.4516016244888306, 'logps/rejected': -0.7649933695793152, 'logps/chosen': -0.5109491348266602, 'logits/rejected': -1.0278687477111816, 'logits/chosen': -0.998356819152832, 'epoch': 0.84}
 84%|████████▍ | 76/90 [1:47:46<19:46, 84.77s/it]
 86%|████████▌ | 77/90 [1:49:09<18:12, 84.06s/it]
 87%|████████▋ | 78/90 [1:50:32<16:46, 83.85s/it]
                                                 
{'loss': 0.4002, 'grad_norm': 21.2640579344066, 'learning_rate': 4.254694387063513e-08, 'rewards/chosen': -1.834343671798706, 'rewards/rejected': -3.0244972705841064, 'rewards/accuracies': 0.78125, 'rewards/margins': 1.19015371799469, 'logps/rejected': -0.7632676959037781, 'logps/chosen': -0.5742309093475342, 'logits/rejected': -1.108099102973938, 'logits/chosen': -0.9550021290779114, 'epoch': 0.86}
 87%|████████▋ | 78/90 [1:50:32<16:46, 83.85s/it]
 88%|████████▊ | 79/90 [1:51:57<15:25, 84.17s/it]
 89%|████████▉ | 80/90 [1:53:21<14:00, 84.01s/it]
                                                 
{'loss': 0.4503, 'grad_norm': 18.673607342269282, 'learning_rate': 2.9710412292443864e-08, 'rewards/chosen': -2.2027089595794678, 'rewards/rejected': -2.657236337661743, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.45452722907066345, 'logps/rejected': -0.7073379755020142, 'logps/chosen': -0.6759985685348511, 'logits/rejected': -1.1113317012786865, 'logits/chosen': -0.9732385277748108, 'epoch': 0.88}
 89%|████████▉ | 80/90 [1:53:21<14:00, 84.01s/it]
 90%|█████████ | 81/90 [1:54:42<12:30, 83.34s/it]
 91%|█████████ | 82/90 [1:56:10<11:15, 84.48s/it]
                                                 
{'loss': 0.4275, 'grad_norm': 20.66591410076854, 'learning_rate': 1.9100845864633875e-08, 'rewards/chosen': -1.7409307956695557, 'rewards/rejected': -2.579927444458008, 'rewards/accuracies': 0.71875, 'rewards/margins': 0.8389963507652283, 'logps/rejected': -0.6631650924682617, 'logps/chosen': -0.5436615943908691, 'logits/rejected': -1.1206640005111694, 'logits/chosen': -0.9643717408180237, 'epoch': 0.9}
 91%|█████████ | 82/90 [1:56:10<11:15, 84.48s/it]
 92%|█████████▏| 83/90 [1:57:31<09:45, 83.63s/it]
 93%|█████████▎| 84/90 [1:58:54<08:21, 83.52s/it]
                                                 
{'loss': 0.4226, 'grad_norm': 20.541487469626894, 'learning_rate': 1.0782051768070478e-08, 'rewards/chosen': -1.6365560293197632, 'rewards/rejected': -2.9360275268554688, 'rewards/accuracies': 0.8125, 'rewards/margins': 1.2994712591171265, 'logps/rejected': -0.7699326276779175, 'logps/chosen': -0.557407021522522, 'logits/rejected': -0.9977273941040039, 'logits/chosen': -0.9863231182098389, 'epoch': 0.93}
 93%|█████████▎| 84/90 [1:58:54<08:21, 83.52s/it]
 94%|█████████▍| 85/90 [2:00:20<07:00, 84.03s/it]
 96%|█████████▌| 86/90 [2:01:41<05:32, 83.17s/it]
                                                 
{'loss': 0.4322, 'grad_norm': 21.215365681656902, 'learning_rate': 4.804060209276395e-09, 'rewards/chosen': -2.0106406211853027, 'rewards/rejected': -3.2409169673919678, 'rewards/accuracies': 0.75, 'rewards/margins': 1.230276346206665, 'logps/rejected': -0.8258072137832642, 'logps/chosen': -0.6368429660797119, 'logits/rejected': -1.0424249172210693, 'logits/chosen': -0.9824389219284058, 'epoch': 0.95}
 96%|█████████▌| 86/90 [2:01:41<05:32, 83.17s/it]
 97%|█████████▋| 87/90 [2:03:07<04:11, 83.96s/it]
 98%|█████████▊| 88/90 [2:04:29<02:46, 83.48s/it]
                                                 
{'loss': 0.4434, 'grad_norm': 22.188781122055346, 'learning_rate': 1.2028235328831903e-09, 'rewards/chosen': -1.4882962703704834, 'rewards/rejected': -2.6088390350341797, 'rewards/accuracies': 0.71875, 'rewards/margins': 1.1205427646636963, 'logps/rejected': -0.6911481618881226, 'logps/chosen': -0.5142249464988708, 'logits/rejected': -0.9768092036247253, 'logits/chosen': -0.9498924016952515, 'epoch': 0.97}
 98%|█████████▊| 88/90 [2:04:29<02:46, 83.48s/it][2025-04-17 11:38:32,325] [WARNING] [stage3.py:2069:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
 99%|█████████▉| 89/90 [2:05:51<01:23, 83.12s/it]
100%|██████████| 90/90 [2:07:12<00:00, 82.44s/it]
                                                 
{'loss': 0.4415, 'grad_norm': 18.290515353238494, 'learning_rate': 0.0, 'rewards/chosen': -1.7044960260391235, 'rewards/rejected': -3.2121682167053223, 'rewards/accuracies': 0.78125, 'rewards/margins': 1.5076723098754883, 'logps/rejected': -0.7654825448989868, 'logps/chosen': -0.5427008867263794, 'logits/rejected': -1.024653673171997, 'logits/chosen': -0.9005169868469238, 'epoch': 0.99}
100%|██████████| 90/90 [2:07:12<00:00, 82.44s/it][INFO|trainer.py:3503] 2025-04-17 11:40:03,118 >> Saving model checkpoint to /data/username/grafting/saves/llama3-8b/full/dpo_math2/checkpoint-90
[INFO|configuration_utils.py:472] 2025-04-17 11:40:03,120 >> Configuration saved in /data/username/grafting/saves/llama3-8b/full/dpo_math2/checkpoint-90/config.json
[INFO|configuration_utils.py:807] 2025-04-17 11:40:03,120 >> Configuration saved in /data/username/grafting/saves/llama3-8b/full/dpo_math2/checkpoint-90/generation_config.json
[INFO|modeling_utils.py:2773] 2025-04-17 11:40:19,051 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 4 checkpoint shards. You can find where each parameters has been saved in the index located at /data/username/grafting/saves/llama3-8b/full/dpo_math2/checkpoint-90/model.safetensors.index.json.
[INFO|tokenization_utils_base.py:2702] 2025-04-17 11:40:19,054 >> tokenizer config file saved in /data/username/grafting/saves/llama3-8b/full/dpo_math2/checkpoint-90/tokenizer_config.json
[INFO|tokenization_utils_base.py:2711] 2025-04-17 11:40:19,054 >> Special tokens file saved in /data/username/grafting/saves/llama3-8b/full/dpo_math2/checkpoint-90/special_tokens_map.json
[2025-04-17 11:40:19,668] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step90 is about to be saved!
[2025-04-17 11:40:19,675] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /data/username/grafting/saves/llama3-8b/full/dpo_math2/checkpoint-90/global_step90/zero_pp_rank_0_mp_rank_00_model_states.pt
[2025-04-17 11:40:19,675] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /data/username/grafting/saves/llama3-8b/full/dpo_math2/checkpoint-90/global_step90/zero_pp_rank_0_mp_rank_00_model_states.pt...
[2025-04-17 11:40:19,684] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /data/username/grafting/saves/llama3-8b/full/dpo_math2/checkpoint-90/global_step90/zero_pp_rank_0_mp_rank_00_model_states.pt.
[2025-04-17 11:40:19,688] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /data/username/grafting/saves/llama3-8b/full/dpo_math2/checkpoint-90/global_step90/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt...
[2025-04-17 11:40:40,014] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /data/username/grafting/saves/llama3-8b/full/dpo_math2/checkpoint-90/global_step90/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.
[2025-04-17 11:40:40,014] [INFO] [engine.py:3478:_save_zero_checkpoint] zero checkpoint saved /data/username/grafting/saves/llama3-8b/full/dpo_math2/checkpoint-90/global_step90/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
[2025-04-17 11:40:40,772] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step90 is ready now!
[INFO|trainer.py:2394] 2025-04-17 11:40:40,776 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


                                                 
{'train_runtime': 7680.3841, 'train_samples_per_second': 1.512, 'train_steps_per_second': 0.012, 'train_loss': 0.5258067965507507, 'epoch': 0.99}
100%|██████████| 90/90 [2:08:00<00:00, 82.44s/it]
100%|██████████| 90/90 [2:08:00<00:00, 85.34s/it]
***** train metrics *****
  epoch                    =     0.9917
  total_flos               =        0GF
  train_loss               =     0.5258
  train_runtime            = 2:08:00.38
  train_samples            =      11615
  train_samples_per_second =      1.512
  train_steps_per_second   =      0.012
2025-04-17 11:40:40 - INFO - __main__ - *** Training complete ***
2025-04-17 11:40:40 - INFO - __main__ - *** Save model ***
[INFO|trainer.py:3503] 2025-04-17 11:40:50,537 >> Saving model checkpoint to /data/username/grafting/saves/llama3-8b/full/dpo_math2
[INFO|configuration_utils.py:472] 2025-04-17 11:40:50,539 >> Configuration saved in /data/username/grafting/saves/llama3-8b/full/dpo_math2/config.json
[INFO|configuration_utils.py:807] 2025-04-17 11:40:50,539 >> Configuration saved in /data/username/grafting/saves/llama3-8b/full/dpo_math2/generation_config.json
[INFO|modeling_utils.py:2773] 2025-04-17 11:41:06,910 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 4 checkpoint shards. You can find where each parameters has been saved in the index located at /data/username/grafting/saves/llama3-8b/full/dpo_math2/model.safetensors.index.json.
[INFO|tokenization_utils_base.py:2702] 2025-04-17 11:41:06,912 >> tokenizer config file saved in /data/username/grafting/saves/llama3-8b/full/dpo_math2/tokenizer_config.json
[INFO|tokenization_utils_base.py:2711] 2025-04-17 11:41:06,912 >> Special tokens file saved in /data/username/grafting/saves/llama3-8b/full/dpo_math2/special_tokens_map.json
2025-04-17 11:41:07 - INFO - __main__ - Model saved to /data/username/grafting/saves/llama3-8b/full/dpo_math2
2025-04-17 11:41:07 - INFO - __main__ - *** Training complete! ***
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-dev package with apt
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[93m [WARNING] [0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[93m [WARNING] [0m using untested triton version (2.3.1), only 1.0.0 is known to be compatible
